Delay Job Submission
Problem: It has been observed that if we boot gridseed virtual machines for the first time the job completion time increases significantly. It took about 10-15 min to complete the simple job where as it should take about 4 -5 min to complete the simple job. Similar thing could also be observed if we restarted WMS or CE.
Euindia VO obervation -
Tests job run with two computing element registered with euindia vo ce-01.grid.sissa.it and serv07.hep.phy.cam.ac.uk every 15 min and the time duration to complete the job is oberved ( Difference betwn job done status and job running status ) here.
Mercurio vo observations
1) Job completion time (for ce-01.grid.sissa.it / ce-02.grid.sissa.it/ grid2 ) before wms restart and after reconfigure wms here
2)Job completion time (for ce-01.grid.sissa.it / ce-02.grid.sissa.it/ grid2) after wms restart here
Gridats vo observations
1) Job completion time (for ce-01.grid.sissa.it/ ce-02.grid.sissa.it/ grid.dmi.units.it ) using gridats wms (wms01.grid.elettra.trieste.it)here
Gridseed vo observations
Tests job run with two computing element registered with gridseed ce-03.grid.seed and ce-04.grid.seed every 5 min and the time duration to complete the job is oberved ( Difference betwn job done status and job running status )
1) After configuration and before restart here
2) After restart WMS / ce-03 /ce-4 here
Gridseed obervations with cream ce (ce-2.grid.seed)
1) After configuration and before restart here
2) After restart WMS / ce-03 /ce-4 here
All data in excel format is available here
Solution:
1) Partial solution is to update globus-gma.conf
Add the following parameters to
/opt/globus/etc/globus-gma.conf:
tick 30
stateage 30
It will tell globus-gma to refresh job list and job states every 30 seconds.
By default CE is tuned for a long production jobs. But for production CE with high load
default values are 100/300 or 300/600.
Note: for the gridseed computing element we have already reduce the values.
2) Reconfiguring WMS / CE could work most of the time.
i) Reconfigure wms with /opt/glite/yaim/bin/ig_yaim -c -s /root/config/ig-site-info.def -n ig_WMS -n ig_LB
ii) Reconfiure computing element with /opt/glite/yaim/bin/ig_yaim -c -s /root/config/ig-site-info.def -n ig_CE_torque -n ig_BDII_site

