TROUBLESHOOTING
1. Virtualbox Related
PROBLEM: The Virtual Machine doesn't start and exit with error:
Failed to open a session for the virtual machine master. Failed to load VMMR0.r0 (VERR_SUPLIB_OWNER_NOT_ROOT). Unknown error creating VM (VERR_SUPLIB_OWNER_NOT_ROOT).
SOLUTION: Check the ownership and the permissions of involved files.
PROBLEM: Virtualbox cannot import OVF file
SOLUTION: Check if in the OVF the <AttachedDevice> section is the same image uuid for all the devices. If it is, change it with the correct one (You'll find it in <DiskSection>).
2. Unable to do uberftp on WMS
(https://savannah.cern.ch/bugs/?42478)
[mangesh@portal mercurio]$ uberftp wms.grid.sissa.it 220 wms.grid.sissa.it GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready. 530-Login incorrect. : globus_gss_assist: Error invoking callout 530-globus_callout_module: Error with dynamic library: couldn't dlopen \ /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: \/opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: \cannot open shared object file: No such file or directory
Solution -
ll /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so ls: /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: No such file or directory [root@wms config]# ll /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so lrwxrwxrwx 1 root root 35 Oct 12 14:53 /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so -> liblcas_lcmaps_gt4_mapping.so.0.0.0 [root@wms config]# ln -s /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so
3. Unable to get proxy delegation
glite-wms-job-delegate-proxy -d $USER Connecting to the service https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server Connection failed: Connection refused connect failed in tcp_connect() Error code: SOAP-ENV:Client
Solution -
# service gLite restart STOPPING SERVICES *** glite-lb-bkserverd: Stopping glite-lb-notif-interlogd ...done Stopping glite-lb-bkserverd (11319) ... done STARTING SERVICES *** glite-lb-bkserverd: Starting glite-lb-bkserverd ...[11969]: DB '(null)' done Starting glite-lb-notif-interlogd ... done #/opt/glite/etc/init.d/glite-wms-wmproxy restart #/opt/glite/etc/init.d/glite-lb-proxy restart
4. Delay In job submission
Update the globus-gma.conf to update job stat faster
[root@ce-3 ~]# cat /opt/globus/etc/globus-gma.conf # fileage 86400 # stateage 600 # tout 60 # debug 1 tick 30 stateage 30
5. Grid proxy verify.
Good document http://www.nikhef.nl/~janjust/proxy-verify/
download grid-proxy-verify
compile with $gcc -o grid-proxy-verify grid-proxy-verify.c -lssl -lcrypto
Should not have any error with proxy like
[patil@ui-1 ~]$ ./grid-proxy-verify /tmp/x509up_u505 ERROR: Verifying proxy: Proxy subject must begin with the issuer. ERROR: Verifying certificate chain: application verification failure
It should be something like
[mangesh@portal ~]$ ./grid-proxy-verify /tmp/x509up_u565 OK
6. BDII is publishing old data
PROBLEM: Due the time problem is possible that a bdii is publishing data that are not updated.
SOLUTION
On the affected host go to directory /opt/glite/var/cache/gip. In that directory are cached the information published by the bdii. The bdii service checks periodically if something changes by checking the date of the files. But if the date on the host is previous of the date of the files the system will not updated the files.
So, the solution is simple, stop the service
# service bdii stop
Remove the files in the directory and restart the service
# service bdii start
7. VOMS related
7.1 VERR_DIR & VERR_SIGN
When you have the following errors:
Warning - Unable to submit the job to the service: https://wms.research-infrastructures.eu:7443/glite_wms_wmproxy_server Unable to retrive VOMS Proxy information: VERR_DIR Method: jobSubmit Switching to next WMProxy Server... Error - Operation failed Unable to find any endpoint where to perform service request
or
Connecting to the service https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server Warning - Unable to submit the job to the service: https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server Unable to retrive VOMS Proxy information: VERR_SIGN Method: jobSubmit ...
This happens becouse the glite-voms-server-config.py generates a "bad" file (/opt/glite/etc/voms/<VO>/voms.conf). To fix this "bug" edit the previous file, find the row with:
... --uri=<hostname> ...
and add to this line the voms service port (usually 8443).
... --uri=<hostname>:<port> ...
If the problem persist, try to check your (/etc/grid-security/vomsdir) directory.
Probably YAIM has created a directory with the name of the VO supported. Try to simply rename
them and check again if the service turns back to work.'''
Finelly restart the service with:
service gLite restart
And remember to re-generate the proxy certificate on the UI
7.2 Voms-admin
When ran as root, voms-admin uses the host credentials fount in /etc/grid-security. So you have to add this certificate as VO-admin:
/opt/glite/sbin/voms-db-deploy.py add-admin --vo <VO> --cert /etc/grid-security/hostcert.pem
8. Cream CE related errors
8.1 Integration between cream CE and pbs/torque
If on checking job's status you get the following message:
The job cannot be submitted because the blparser service is not alive
check if the BLAH blparser is running on CreamCE host with the following command:
opt/glite/etc/init.d/glite-ce-blahparser status
This may be because you are trying to install a pbs/torque server in a location different that /usr/bin (the default installation path in glite torque rpms) you have to reconfigure the blah server to the new location.
Edit the file /opt/glite/etc/blah.config and change the values:
# Path where pbs executables are located pbs_binpath=PBS_BIN_LOCATION [cut] # Path where the pbs logs are located (for pbs $pbs_spoolpath/server_logs) pbs_spoolpath=PBS_LOGS_LOCATION
Now, you have to restart tomcat
service tomcat5 restart
8.2 Integration between cream CE and pbs/torque (using yaim)
Edit the file /opt/glite/yaim/defaults/ig-site-info.pre and change:
######################################### # Batch system configuration variables # ######################################### # The path of the lrms commands BATCH_BIN_DIR=<PBS_BIN_LOCATION>
9. Sites to check for help
http://indico.ifca.es/e-ciencia/index.php/Troubleshooting

