TROUBLESHOOTING

1.  Virtualbox Related

PROBLEM: The Virtual Machine doesn't start and exit with error:

Failed to open a session for the virtual machine master.
Failed to load VMMR0.r0 (VERR_SUPLIB_OWNER_NOT_ROOT).
Unknown error creating VM (VERR_SUPLIB_OWNER_NOT_ROOT).

SOLUTION: Check the ownership and the permissions of involved files.


PROBLEM: Virtualbox cannot import OVF file

SOLUTION: Check if in the OVF the <AttachedDevice> section is the same image uuid for all the devices. If it is, change it with the correct one (You'll find it in <DiskSection>).

2.  Unable to do uberftp on WMS

(https://savannah.cern.ch/bugs/?42478)


[mangesh@portal mercurio]$ uberftp wms.grid.sissa.it
220 wms.grid.sissa.it GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready.
530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: Error with dynamic library: couldn't dlopen \
/opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: 
\/opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: 
\cannot open shared object file: No such file or directory

Solution -

 ll /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so 
ls: /opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so: No such file or directory
[root@wms config]# ll /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so 
lrwxrwxrwx  1 root root 35 Oct 12 14:53 /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so ->
liblcas_lcmaps_gt4_mapping.so.0.0.0
[root@wms config]# ln -s /opt/glite/lib/liblcas_lcmaps_gt4_mapping.so
/opt/glite/lib/liblcas_lcmaps_gt4_mapping_gcc32.so

3.  Unable to get proxy delegation


glite-wms-job-delegate-proxy -d $USER

Connecting to the service https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server


Connection failed: Connection refused
connect failed in tcp_connect()
Error code: SOAP-ENV:Client

Solution -

# service gLite restart
STOPPING SERVICES
*** glite-lb-bkserverd:
Stopping glite-lb-notif-interlogd ...done
Stopping glite-lb-bkserverd (11319) ... done

STARTING SERVICES
*** glite-lb-bkserverd:
Starting glite-lb-bkserverd ...[11969]: DB '(null)'
 done
Starting glite-lb-notif-interlogd ... done


#/opt/glite/etc/init.d/glite-wms-wmproxy restart

#/opt/glite/etc/init.d/glite-lb-proxy restart

4.  Delay In job submission


Update the globus-gma.conf to update job stat faster

[root@ce-3 ~]# cat /opt/globus/etc/globus-gma.conf
# fileage 86400
# stateage 600
# tout 60
# debug 1
tick 30
stateage 30

5.  Grid proxy verify.


Good document http://www.nikhef.nl/~janjust/proxy-verify/

download grid-proxy-verify

compile with $gcc -o grid-proxy-verify grid-proxy-verify.c -lssl -lcrypto

Should not have any error with proxy like

[patil@ui-1 ~]$ ./grid-proxy-verify /tmp/x509up_u505
ERROR:  Verifying proxy: Proxy subject must begin with the issuer.
ERROR:  Verifying certificate chain: application verification failure

It should be something like

[mangesh@portal ~]$ ./grid-proxy-verify /tmp/x509up_u565
OK

6.  BDII is publishing old data


PROBLEM: Due the time problem is possible that a bdii is publishing data that are not updated.

SOLUTION On the affected host go to directory /opt/glite/var/cache/gip. In that directory are cached the information published by the bdii. The bdii service checks periodically if something changes by checking the date of the files. But if the date on the host is previous of the date of the files the system will not updated the files.

So, the solution is simple, stop the service

# service bdii stop

Remove the files in the directory and restart the service

# service bdii start

7.  VOMS related


7.1  VERR_DIR & VERR_SIGN

When you have the following errors:

 Warning - Unable to submit the job to the service: 
 https://wms.research-infrastructures.eu:7443/glite_wms_wmproxy_server
 Unable to retrive VOMS Proxy information: VERR_DIR

 Method: jobSubmit

 Switching to next WMProxy Server...

 Error - Operation failed Unable to find any endpoint where to perform service request

or

Connecting to the service https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server


Warning - Unable to submit the job to the service: 
https://wms.grid.sissa.it:7443/glite_wms_wmproxy_server
Unable to retrive VOMS Proxy information: VERR_SIGN

Method: jobSubmit

...

This happens becouse the glite-voms-server-config.py generates a "bad" file (/opt/glite/etc/voms/<VO>/voms.conf). To fix this "bug" edit the previous file, find the row with:

...
--uri=<hostname>
...

and add to this line the voms service port (usually 8443).

...
--uri=<hostname>:<port>
...

If the problem persist, try to check your (/etc/grid-security/vomsdir) directory. Probably YAIM has created a directory with the name of the VO supported. Try to simply rename them and check again if the service turns back to work.'''

Finelly restart the service with:

service gLite restart

And remember to re-generate the proxy certificate on the UI

7.2  Voms-admin

When ran as root, voms-admin uses the host credentials fount in /etc/grid-security. So you have to add this certificate as VO-admin:

/opt/glite/sbin/voms-db-deploy.py add-admin --vo <VO> --cert /etc/grid-security/hostcert.pem

8.  Cream CE related errors


8.1  Integration between cream CE and pbs/torque

If on checking job's status you get the following message:

The job cannot be submitted because the blparser service is not alive

check if the BLAH blparser is running on CreamCE host with the following command:

opt/glite/etc/init.d/glite-ce-blahparser status

This may be because you are trying to install a pbs/torque server in a location different that /usr/bin (the default installation path in glite torque rpms) you have to reconfigure the blah server to the new location.

Edit the file /opt/glite/etc/blah.config and change the values:

# Path where pbs executables are located
pbs_binpath=PBS_BIN_LOCATION

[cut]

# Path where the pbs logs are located (for pbs $pbs_spoolpath/server_logs)
pbs_spoolpath=PBS_LOGS_LOCATION

Now, you have to restart tomcat

service tomcat5 restart

8.2  Integration between cream CE and pbs/torque (using yaim)

Edit the file /opt/glite/yaim/defaults/ig-site-info.pre and change:

#########################################
# Batch system configuration variables  #
#########################################

# The path of the lrms commands 
BATCH_BIN_DIR=<PBS_BIN_LOCATION>

9.  Sites to check for help

  http://indico.ifca.es/e-ciencia/index.php/Troubleshooting