Delay Job Submission

Problem: It has been observed that if we boot gridseed virtual machines for the first time the job completion time increases significantly. It took about 10-15 min to complete the simple job where as it should take about 4 -5 min to complete the simple job. Similar thing could also be observed if we restarted WMS or CE.

Euindia VO obervation -


Tests job run with two computing element registered with euindia vo ce-01.grid.sissa.it and serv07.hep.phy.cam.ac.uk every 15 min and the time duration to complete the job is oberved ( Difference betwn job done status and job running status ) here.

Mercurio vo observations

1) Job completion time (for ce-01.grid.sissa.it / ce-02.grid.sissa.it/ grid2 ) before wms restart and after reconfigure wms here

2)Job completion time (for ce-01.grid.sissa.it / ce-02.grid.sissa.it/ grid2) after wms restart here

Gridats vo observations

1) Job completion time (for ce-01.grid.sissa.it/ ce-02.grid.sissa.it/ grid.dmi.units.it ) using gridats wms (wms01.grid.elettra.trieste.it)here

Gridseed vo observations

Tests job run with two computing element registered with gridseed ce-03.grid.seed and ce-04.grid.seed every 5 min and the time duration to complete the job is oberved ( Difference betwn job done status and job running status )

1) After configuration and before restart here

2) After restart WMS / ce-03 /ce-4 here

Gridseed obervations with cream ce (ce-2.grid.seed)

1) After configuration and before restart here

2) After restart WMS / ce-03 /ce-4 here


All data in excel format is available here

Solution:
1) Partial solution is to update globus-gma.conf
Add the following parameters to /opt/globus/etc/globus-gma.conf:
tick 30
stateage 30
It will tell globus-gma to refresh job list and job states every 30 seconds. By default CE is tuned for a long production jobs. But for production CE with high load default values are 100/300 or 300/600.

Note: for the gridseed computing element we have already reduce the values.

2) Reconfiguring WMS / CE could work most of the time.

i) Reconfigure wms with /opt/glite/yaim/bin/ig_yaim -c -s /root/config/ig-site-info.def -n ig_WMS -n ig_LB
ii) Reconfiure computing element with /opt/glite/yaim/bin/ig_yaim -c -s /root/config/ig-site-info.def -n ig_CE_torque -n ig_BDII_site