Administration of the Batchsystem Server

From Gridkaschool

The batch system server is one core component of a compute resource, so its proper configuration is very important. System administrators influence the operation of the batch system mainly by adjusting two configurations.

Queue configuration

The queue configuration can be done by executing qmgr on the batch system server. You will be prompted a command line. The command print server will show you common configuration information about the batch system

set server scheduling = True
set server acl_hosts = localhost
set server acl_hosts += <other hosts allowed to submit jobs>
set server managers = <e-mail of batch system manager>
set server operators = <e-mail of batch system operator>
set server default_queue = <queuename>
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 12
set server poll_jobs = False
set server log_level = 1

In case of problems with the batch system you can e.g. change the log level by executing

set server log_level = 3

Information for a certain queue in the batch system can be obtained by

list queue <queue name>

The output should look similar to the one below

Queue dech
        queue_type = Execution
        Priority = 100
        max_queuable = 100
        total_jobs = 0
        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 
        max_running = 100
        resources_max.cput = 48:00:00
        resources_max.nodect = 1
        resources_max.walltime = 72:00:00
        resources_default.nodes = nodes=1:ppn=1
        resources_default.walltime = 72:00:00
        acl_group_enable = True
        acl_groups = dech
        mtime = Sun Mar 14 17:08:51 2010
        resources_assigned.nodect = 0
        enabled = True
        started = True

Advanced configuration

The MAUI scheduler allows system administrators to define rules that prioritize jobs of certain users and/ or groups and to set limits for users.

GROUPCFG[atlas] PRIORITY=2000
GROUPCFG[lhcb] PRIORITY=2000
USERCFG[specialk] PRIORITY=4000 MAXJOB=15

Additionally MAUI allows the creation of so called standing reservation. This type of reservation is important i. e. to allow direct access to resources for special virtual organizations like ops. The example below shows a standing reservation named sam. It is valid 24/7 and reserves two job slots exclusively to members of the local users group ops.

The spaceflex flag is an interesting feature and allows MAUI to move the reserved job slots if required (e.g. if the node responsible for providing the two slots breaks down).

SRCFG[sam] TASKCOUNT=1 RESOURCES=PROCS:2
SRCFG[sam] PERIOD=INFINITY
SRCFG[sam] STARTTIME=00:00:00 ENDTIME=24:00:00
SRCFG[sam] GROUPLIST=ops
SRCFG[sam] FLAGS=SPACEFLEX

Go back to gLite Administration Course