Testing the Batchsystem Server

From Gridkaschool
Revision as of 18:22, 28 December 2012 by Pweber (talk | contribs) (Created page with "Since no worker nodes are available to the batch system server until now, only a limited set of functions can be tested at this very moment. At least you can check if the yaim co…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Since no worker nodes are available to the batch system server until now, only a limited set of functions can be tested at this very moment. At least you can check if the yaim configuration step was successful.

  • Check if the hostname of the batch system server is set properly in the file /var/spool/pbs/server_name
cat /var/spool/pbs/server_name
  • Check if the list of worker nodes is set in the file /var/spool/pbs/server_priv/nodes
cat /var/spool/pbs/server_priv/nodes
  • Check if the Torque process is running
ps aux | grep -i pbs
  • Check if the Maui process is running
ps aux | grep -i maui

In the case that one of the processes is not running, please start it

/etc/init.d/maui start
/etc/init.d/pbs_server

If the pbs process is running you can execute qmgr on the command line. This command allows you to manage queues manually. To display the configuration of the batch system server enter

print server

and to list only the configuration of a single queue

list queue <queue_name>

Other tools

The tools described next can only be tested if the worker node has already been installed and the torque and maui processes are running.

Check the status of the available worker nodes by issuing the pbsnodes command

pbsnodes -l  #shows the offline nodes

As output you should receive that all worker nodes configured in your worker node list are down:

gks-1-123.fzk.de     down

Print the attributes for a single worker node

pbsnodes -a <hostname of the worker node>

Print queue information via the qstat command (i. e. for a queue dech)

qstat -Q test
Queue              Max   Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T         
----------------   ---   ---   ---   ---   ---   ---   ---   ---   ---   --- -         
test               100     0   yes   yes     0     0     0     0     0     0 E 
----

Try to figure what the following commands do:

  • checkjob
  • diagnose
  • showq
  • showres
  • tracejob

Remember some of them! You can uses them later when submitting jobs from the CREAM compute element to the batchsystem.


Now you can proceed to the batch system server administration


Go back to gLite Administration Course,Batch system server