Getting Started with ARC
Gridseed-ARC_UI command line interface
ngsub: find suitable resources and submit a jobngstat: check the status of jobs and resourcesngcat: display stdout, stderr of a running jobngget: retrieve the results of a finished jobngkill: stop a jobngclean: delete a job from a computing resourcengsync: find user's jobsngls: list files on a storage resource or in job's sandboxngcp: transfer files to and from cluster and storages
The Batch system model
ARC can be viewed as an extended batch queue system. Jobs are submitted from a submission node (Gridseed-ARC_UI) to a server node that act sa as a resource manager (Gridsees-ARC_CE). Jobs are processed according the priority they have within the system queue. Local resources are selected and allocated for the execution of the job. Results are passed back either to the submission node or to an accessible storage resource.
Anatomy of an ARC job
- job is described using extended Resource Specification Languafe (xrsl)
ngsubto submit the job- Runtime Environments used to access pre-installed applications
- File transfer from the submission node or from an accessible storage resource to the Gridseed-ARC_CE
- ARC transforms the grid job into a local batch system job
- submission to the local batch system done and monitored by the ARC middleware
- retrieval of results either back to the submission node or to an accessible storage resource.
Authentication
Checking your certificate
Your personal certificate is split in two separate files stored in a
directory called .globus. We can check this by the following
command:
[gridseed01@ui-1 ~]$ ls -lrt .globus/ total 12 -r-------- 1 gridseed02 gridseed 1751 Jul 30 15:21 userkey.pem -rw-r--r-- 1 gridseed02 gridseed 3882 Jul 30 15:21 usercert.pem
As you can see in the .globus directory userkey.pem (the private part of the key) and usercert.pem (the public part of your key i.e. the certificate itself) are stored with the right permissions.
These two files are effectively your public and private keys, which will be used for the authenticated connections with all the other grid elements. It is essential that they have the correct file permissions otherwise you won't be able to create a proxy.
We can then take a look at the full certificate by issuing the following command:
[gridseed01@ui-1 ~]$ grid-cert-info Certificate: Data: Version: 3 (0x2) Serial Number: 25 (0x19) Signature Algorithm: sha1WithRSAEncryption Issuer: O=GRIDSEED, DC=seed, DC=grid, CN=GRIDSEED CA 1 Validity Not Before: Dec 5 13:54:45 2009 GMT Not After : Dec 5 13:54:45 2010 GMT Subject: O=GRIDSEED, DC=seed, DC=grid, OU=Personal Certificate, CN=Gridseed01 Gridseed Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (2048 bit) Modulus (2048 bit): 00:c1:e8:63:ab:04:b1:1a:c7:92:19:22:92:df:34: 35:df:52:49:09:71:1d:07:c3:2d:35:a9:ea:1a:7f: 72:92:4c:4a:1b:16:e5:72:b4:3e:fe:e0:a2:12:ba: bf:c3:14:dc:c5:ee:64:56:ff:9f:67:58:fa:81:be: d4:72:6e:6a:00:67:ce:1b:b1:ed:55:86:f0:b3:af: fc:79:7b:43:d8:13:3e:5d:2b:35:c7:31:f0:fd:08: bd:5a:80:8c:cb:c1:69:65:99:d4:38:a2:ac:9f:c2: 27:d4:13:41:61:5b:5c:45:cb:ca:69:37:63:aa:7a: 45:53:18:0d:89:dd:de:f1:16:0f:a7:64:75:0e:c0: 83:ac:02:62:23:36:36:b9:5a:e7:ee:7a:26:99:a7: 40:48:52:eb:b6:52:94:69:1a:4b:9d:2f:0f:1a:a0: 6b:2f:e2:bc:36:a0:c8:81:7e:7a:14:40:29:ac:4f: 09:47:4f:09:53:5f:59:b3:ea:88:7a:22:a3:56:2f: 55:2b:bf:75:57:9f:4a:3b:8c:f4:5a:ae:63:a4:0c: e2:c7:f7:d4:80:10:12:a4:0d:cc:80:ff:04:d4:9c: ba:76:93:8e:94:0e:1b:3a:71:21:ea:d6:12:0e:7b: e3:07:34:fa:b7:3d:9a:7e:17:2b:e8:08:cd:9f:0c: 4e:d9 Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Basic Constraints: critical CA:FALSE X509v3 Key Usage: critical Digital Signature, Key Encipherment, Data Encipherment X509v3 Extended Key Usage: TLS Web Client Authentication, E-mail Protection X509v3 CRL Distribution Points: URI:http://ca.grid.seed/ca/ca.crl X509v3 Certificate Policies: Policy: 1.3.6.1.4.1.10403.10.1.7 Policy: 1.2.840.113612.5.2.2.1 X509v3 Subject Key Identifier: 57:02:8E:B4:8A:37:D8:F1:EC:A0:BF:C9:D8:9F:3E:08:AD:91:13:79 X509v3 Authority Key Identifier: keyid:2B:12:26:16:31:57:2D:17:AE:A0:09:B0:3B:8F:0C:33:68:23:CC:E3 DirName:/O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 serial:00 X509v3 Subject Alternative Name: email:pippo@nowhere Signature Algorithm: sha1WithRSAEncryption 5e:e5:5d:1f:bd:c3:a8:05:ab:f5:b6:f4:a9:53:90:77:9d:49: 19:c2:7c:83:6f:88:37:05:ea:f8:83:af:1d:08:d0:e4:ca:6a: 28:f2:57:e4:a3:05:04:2b:4c:98:82:fe:a4:d0:52:52:92:fd: bd:61:28:74:50:e9:93:58:14:e2:25:70:92:ba:1e:a0:ad:ae: bf:a2:94:3a:56:d7:4a:e5:b5:de:21:3b:06:f7:d0:94:a8:4f: 64:1e:0b:20:3d:d5:3a:bb:69:31:06:0e:7e:30:ff:b5:bd:12: e1:be:e8:dd:24:05:13:6f:ea:1e:bc:42:8d:61:84:ac:51:65: d4:64:05:5d:9e:af:7f:8e:b3:af:33:d6:ff:d2:a2:ad:d2:51: 3d:38:3d:ae:cb:53:69:19:5a:97:43:da:79:c8:87:04:0a:3f: f8:40:45:52:75:fd:4c:f1:3a:64:ec:ec:b6:fe:54:3e:00:33: 15:ec:41:ba:0c:0e:f6:08:9c:55:8d:d1:1b:35:59:35:f9:23: 92:10:22:4f:e4:37:3c:ca:63:e1:84:ea:8b:63:1f:cd:0d:9b: ea:60:b7:cb:21:27:93:19:b5:a8:7c:da:b7:b2:c4:27:54:4a: e3:fc:d7:3b:f8:86:a7:c7:c3:ae:e2:8f:03:47:23:43:53:2a: b1:4b:d4:4f
There are three interesting things to check here:
- dates of creation and of expiration
- the name and subject of the Certification Authority which issued the certificate,
- the Common Name (CN) of the certificate owner, and the certificate subject, which uniquely identifies the certificate owner.
Creation of a grid proxy
We are now ready to create a proxy from our certificate: by means of it we can now start interacting with the grid.
Please remember that the __passphrase__ requested is: gridseed
[gridseed01@ui-1 ~]$ grid-proxy-init Your identity: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed Enter GRID pass phrase for this identity: Creating proxy ................................. Done Your proxy is valid until: Sun Dec 6 02:56:53 2009
Check your grid proxy
Once that your proxy has been created, you can gather info on it through the grid-proxy-info command.
[gridseed01@ui-1 ~]$ grid-proxy-info subject : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed/CN=279397512 issuer : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed identity : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed type : RFC 3820 compliant impersonation proxy strength : 512 bits path : /tmp/x509up_u1501 timeleft : 11:59:04
Alternatively one can use the ngtest command with the -E, -certificate option to print info about installed user- and CA-certificates
[gridseed01@ui-1 ~]$ ngtest -E Certificate information: Certificate: /home/gridseed01/.globus/usercert.pem Subject name: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed Valid until: Sun Dec 5 14:54:45 2010 Proxy: /tmp/x509up_u1501 Proxy-subject: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed/CN=279397512 Valid for: 11 hours, 58 minutes, 54 seconds Certificate issuer: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 CA-certificates installed: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
Browsing the Information System
The ARC information system is an OpenLdap-based system which is derived from the Globus Monitoring and Discovery Services framework. It uses BDII together with a updated MDS LDAP Schema. Nordugrid computing resources publish information about their available resources using LDAP servers. Contact information to these LDAP servers is collected into the Nordugrid Information System.
ngstat command
The ngstat command is used for obtaining the status of jobs that have been submitted with ARC; with the -q option could be used to retrieve information about the status of a cluster resource.
USAGE: ngstat [options] [job ...] Options: -a, -all all jobs -i, -joblist filename file containing a list of jobids -c, -cluster [-]name explicitly select or reject a specific cluster -C, -clustlist [-]filename list of clusters to select or reject -s, -status statusstr only select jobs whose status is statusstr -g, -giisurl url url to a GIIS -G, -giislist filename list of GIIS urls -q, -queues show information about clusters and queues -l, -long long format (more information) -t, -timeout time timeout in seconds (default 20) -d, -debug debuglevel from -3 (quiet) to 3 (verbose) - default 0 -x, -anonymous use anonymous bind for MDS queries (default) -X, -gsi use gsi-gssapi bind for MDS queries -v, -version print version information -h, -help print this help
Getting information on a cluster resource
Use ngstat to obtain information about a given resource (by default all ARC command available on gridseed-ARC_UI will point to gridseed-ARC_CE)
[gridseed01@ui-1 ~]$ ngstat -q -l Cluster arc-ce.grid.seed Alias: GridSeed ARC test cluster Contact: gsiftp://arc-ce.grid.seed:2811/jobs Host Certificate Issuer: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 Owner: Democritos Architecture: i686 Operating System: Scientific Linux SL release 4.7 (Beryllium) Size of Scratch Directory: 7760510976 bytes Free Space in Scratch Directory: 5606735872 bytes Size of Cache Directory: 49999249408 bytes Free Space in Cache Directory: 49999249408 bytes Number of CPUs: 8 Number of Used CPUs: 0 Number of Jobs: 0 Number of Computers: 8 processors: 1 Cluster Support: root@localhost Installed Middleware: nordugrid-arc-0.6.3 globus-4.0.8-0.12.el4ng Installed Runtime Environments: APPS/LIFE/TANDEM-09-04-01-1 Network Access on Cluster Nodes: outbound Session directory lifetime: 3 days Trusted Certificate Authorities: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1 Entry valid from: 2009-12-05 15:13:12 Entry valid to: 2009-12-05 15:13:42 Queue default Status: active Number of Running Jobs: 0 Max CPU Time: 2 days Max Wall Time: 3 days Scheduling Policy: FIFO Number of CPUs: 8 Entry valid from: 2009-12-05 15:13:12 Entry valid to: 2009-12-05 15:13:42
Using ldapsearch
The ARC information system could be queried using ldapsearch:
[gridseed01@ui-1 ~]$ ldapsearch -x -h arc-ce.grid.seed -p 2135 -b ' Mds-Vo-name=local,o=grid'
This command prints all the information published by the ARC information system following the Nordugrid/ARC LDAP schema.
Filter information from ldapsearch: list available applications
Display information about available application tags from a given computing resource
[gridseed01@ui-1 ~]$ ldapsearch -x -h arc-ce.grid.seed -p 2135 -b ' Mds-Vo-name=local,o=grid' \ | grep nordugrid-cluster-runtimeenvironment nordugrid-cluster-runtimeenvironment: APPS/LIFE/TANDEM-09-04-01-1
Exercises
Introduction
Basic job workflow:
- Obtain access to a User Interface (Gridseed-ARC_UI)
- Request a user certificate from a Certification Authority
- Deploy the signed certificate on the User Interface
- Create grid proxy
- Write a job description
- Submit job
- Monitor the progress of the job
- Fetch the results
In your $HOME on the Gridseed-ARC_UI you will find the following folders:
[gridseed01@ui-1 ~]$ ls results tutorial_exercises
results will contains the output of the examples you will run through this tutorial
tutorial_exercises contains relevant material for the exercises (xrsl files, scripts, input files); it is organized in three groups of exercises:
- Exercise 1: learn the basic usage of the infrastructure
- Exercise 2: learn how to use RTEs and storage resources by running X!Tandem application
- Exercise 3: learn how a newly application is integrated in the infrastructure and how to run it
Exercise 1: basic usage
Enter folder tutorial_exercises/exercise_1
[gridseed01@ui-1 ~]$ cd tutorial_exercises/exercise_1/
File *README* explains in details how to run each individual exercises.
[gridseed01@ui-1 exercise_1]$ cat README
Exercise_1_1
The very simple exercise_1_1.xrsl shows the basic components of a
job descritption:
(executable=/bin/env)
The executable to be submitted as a main task to a Local Resource Management System (LRMS). Executable is a file that has to be executed as the main process of the task. It could be either a precompiled binary, or a script. Users may transfer their own executables, or use the ones known to be already installed on the remote system (CE).
(stdout=std.out) (stderr=std.err)
The standard output and standard error files. If not listed as part of the !outputFiles, they will be staged back to the User Interface once job will be retrieved.
(gmlog=gridlog)
A name of the directory containing grid-specific diagnostics per job. This directory is kept in the session directory to be available for retrieval.
(jobname="Exercise_1_1")
User-specified job name. This name is meant for convenience of the user. It can be used to select the job while using the Gridseed-ARC_UI. It is also available through the Information System.
(cputime=10) (disk=1)
Example of resource specifications. By these attributes, the resource broker will try to match the job requirements with available computing and storage resources. These parameters affects the match making process for the resource selection which determines where the job will be executed.
Running exercise_1_1
- Check validity of proxy
grid-proxy-info
- Submit job
ngsub -f exercise_1_1.xrsl
If a job is successfully submitted, a job identifier (job ID ) is printed to standard output. The job ID uniquely identifies the job while it is being executed. A typical job ID looks like follows:
gsiftp://arc-ce.grid.seed:2811/jobs/10308913211503407485
You should use this as a handle to refer to the job when doing other
job manipulations, such as querying job status (ngstat), killing
it (ngkill) or retrieving the result (ngget).
- Check status of submitted job
- In order to know about the job status another command is available:
ngstat. A job can be referred to either by the jobID that was returned byngsubat submission time, or by its name if the job description specified a job name.
ngstat <jobid> ngstat Exercise_1_1
- List content of job's sandbox
ngls <jobid>
- Check standard out of submitted job
ngcat <jobid>
- Check standard err of submitted job
ngcat -e <jobid>
- Check internal the grid error log of the job
ngcat -l <jobid>
- Once job status is reported as *FINISHED*
- Retrieve job result
ngget <jobid> -j ngget Exercise_1_1 -j
- Results are retrieved on from the Gridseed-ARC_CE to the Gridseed-ARC_UI in your _$HOME/results_ folder
- The files to be retrieved by ngget should be described by the xRSL as follow:
(outputfiles=(file1 "")(file2 ""))
- By default,
nggetwill create in your $HOME/results a new folder, by the same name as the remote session directory (typically, a numerical string). This new directory will contain all the files listed for download in your job description. If you would like to store the files in another location, use the -dir option. The option -j will assign your job name to the local directory with the output files (be careful not to call all the jobs same name when using this option).
Exercise_1_2
Modify exercise_1_2.xrsl to print out hostname instead of /bin/env
Follow instructions on README to run the exercise
Exercise_1_3
This is an example or running an executable and passing some arguments:
& (executable=/bin/echo) (arguments="Hello World")
will result in the execution of
/bin/echo "Hello World"
Follow instructions on README to run the exercise
Exercise_1_4
Modify exercise_1_4.xrsl to print kernel information to std.our:
uname -a
Follow instructions on README to run the exercise
Exercise_1_5
This is an example of running a bash script that it is passed to the Gridseed-ARC_CE as part of the job submission
Check run_tst.sh
#!/bin/bash echo "[`date`] Start" env echo "---------------" echo "Reading content of current folder [$PWD]" ls -la # Exercise 1_6 # Uncomment the following part # echo "---------------" # echo "Listinf content of archive " # if [ -s archive.tgz ]; then # tar tfz archive.tgz # else # echo "File archive.tgz Not found" # exit 1 # fi # Exercise 1_7 # Uncomment the following part # echo "---------------" > output_list.txt # echo "Listinf content of archive " >> output_list.txt # if [ -s archive.tgz ]; then # tar tfz archive.tgz >> output_list.txt # else # echo "File archive.tgz Not found" >> output_list.txt # exit 1 # fi # Exercise 1_8 # Uncomment the following part # sleep 180 echo "[`date`] Done"
Follow instructions on README to run the exercise
ngls Exercise_1_5 lists the content of the job's sandbox on the Gridseed-ARC_CE
ngls should list also the file run_test.sh that has been transferred from the Gridseed-ARC_UI
check also with ngcat -l the file transfer operation:
Dec 06 11:30:35 Downloader started Dec 06 11:30:35 Check user uploadable file: /run_test.sh Dec 06 11:30:35 User has uploaded file
Exercise_1_6
In this exercise we show the use of the inputFiles directive
(inputFiles=(archive.tgz "gsiftp://arc-ce.grid.seed/repo/exercise_2/archive.tgz"))
Format: (inputFiles=(<filename> <location>) ... )
List of files to be copied to the Gridseed-ARC_CE before the execution, where:
<filename>
is a File name, local to the Gridseed-ARC_CE and always relative to the session directory
<location>
is the location of the file (gsiftp, https, ftp, http URLs, or a path, local to the submission node Gridseed-ARC_UI). If void ("", use the quotes!), the input file is taken from the submission directory.
In this example
gsiftp://arc-ce.grid.seed/repo
is the storage resource where the files have been pre-staged (it could be any remote storage resource)
Modify run_test.sh to list the content of the tar archived names
"archive.tgz" that will be passed as input file in the job description
(uncomment the Exercise_1_6 part in the run_test.sh script)
Follow instructions on README to run the exercise
Exercise_1_7
In this exercise we show the use of the outputFiles directive
(outputFiles=(output_list.txt ""))
Format: (outputFiles=(<string> <URL>) ... )
List of files to be retrieved once job is finished
string
File name, local to the Gridseed-ARC_CE
URL
URL of the remote file; if void ("", use the quotes!), the file is
retrieved on Gridseed-ARC_UI when ngget is issued.
In this example the file output_list.txt will be retrieved on Gridseed-ARC_UI.
Modify run_test.sh to output the list of the tar archive named
archive.tgz to a file named output_list.txt (uncomment the
Exercise_1_7 part in the run_test.sh script)
Follow instructions on README to run the exercise
Exercise_1_8
This example shows how to kill a submitted job using ngkill. A job can be killed almost on any stage of processing through the Grid.
ngkill [options] [job ...] Options: -a, -all all jobs -i, -joblist filename file containing a list of jobIDs -c, -clusters show information about clusters -C, -clustlist [-]text filename list of sites (clusters) to select or reject -s, -status statusstr only select jobs whose status is statusstr -keep keep files on gatekeeper (do not clean) -t, -timeout time timeout for queries (default 40 sec) -d, -debug debuglevel debug level, from -3 (quiet) to 3 (verbose) - default 0 -x, -anonymous use anonymous bind for queries (default) -X, -gsi use GSI-GSSAPI bind for queries -v, -version print version information -h, -help print help page Arguments: job ... list of job IDs and/or jobnames
Modify run_test.sh to introduce a 3' sleep
(uncomment the Exercise_1_7 part in the run_test.sh script)
Follow instructions on README to run the exercise
Exercise 2: Advanced use of the infrastructure (RTE, SE and group submit)
Runtime Environments provide a means to make software packages installed at the systems available on the Grid. Users can specify in the job description file that a specific runtime environment needs to be present in the target system. That avoids the need to send the actual application binary as part of the computation: only input files need to be sent.
In this group of exercises !X!Tandem application will be used
Enter folder tutorial_exercises/exercise_2
[gridseed01@ui-1 ~]$ cd tutorial_exercises/exercise_2/
File *README* explains in details how to run each individual exercises.
[gridseed01@ui-1 exercise_1]$ cat README
xtandem_1
In this exercise a simple and pre-configured !X!Tandem job will be submitted.
We will use input files directly available on the Gridseed-ARC_UI and the pre-defined TANDEM-09-04-01-1 RTE
list the content of xtandem_1.xrsl
[gridseed01@ui-1 exercise_2]$ cat xtandem_1.xrsl &(executable="xtandem_run.sh") (inputFiles=(input.xml "xtandem_input.xml")) (outputFiles=(output.xml "")) (stdout="std.out") (stderr="std.err") (gmlog=gridlog) (jobname="XTANDEM_sample") (runtimeEnvironment=APPS/LIFE/TANDEM-09-04-01-1)
The directive !runtimeEnvironment
Format:(runTimeEnvironment=<string>)
string environment name
The site to submit the job to will be chosen by the Gridseed-ARC_UI among those advertising specified runtime environments. Before starting the job, the Gridseed-ARC_CE will set up environment variables and paths according to those requested.
In this example
APPS/LIFE/TANDEM-09-04-01-1
will be requested
On the Gridseed-ARC_CE the RTE is deployed in _$runtimedir_ folder and is a bash script that will be executed three times during the lifetime of a submitted job
[root@arc-ce ~]# cat /export/software/APPS/LIFE/TANDEM-09-04-01-1
#!/bin/bash
# shared directory for application installation
application_base_path='/export/apps/tandem-linux-09-04-01-1'
# version
tandem_version='09-04-01-1'
case "$1" in
0 )
# execution of RTE on the frontend prior submission of LRMS job
# Here site specific settings could be applied
# For example:
# SGE LRMS requires the setting of the Parallel Environment before submission to the LRMS:
# export joboption_nodeproperty_0=mpich
;;
1 )
# First execution of RTE on compute node - before starting execution of "executable" as specified
in submitted xrsl
export TANDEM_LOCATION=$application_base_path
export TANDEM_TAXONOMY=$application_base_path/bin
export PATH=$TANDEM_LOCATION/bin:$PATH
export TANDEMRUN="$TANDEM_LOCATION/bin/tandem.exe"
;;
2 )
# Second and final execution of RTE on compute node - at the end of the submitted LRMS job
# Clean up
;;
* )
# Now, calling argument is wrong or missing.
# If call was made from NorduGrid ARC, it is considered
# an error. If this script is to be used also to initialize
# MPI environment for local jobs in cluster, raising error here
# could be improper.
return 1
;;
esac
Follow instructions on README to run the exercise
The command
ngcat -l xtandem_1
will show the details on when the requested RTE will be processed
...
if [ ! -z "$RUNTIME_CONFIG_DIR" ] ; then
if [ -r "${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1" ] ; then
runtimeenvironments="${runtimeenvironments}APPS/LIFE/TANDEM-09-04-01-1;"
source ${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1 1
...
if [ ! -z "$RUNTIME_CONFIG_DIR" ] ; then
if [ -r "${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1" ] ; then
source ${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1 2
...
xtandem_2
In this exercise the input file test_spectra.mgf will be staged
from a storage resource directly to the Gridseed-ARC_CE during the
submission process of the job. We will use a different input file
input.xml that will reference the spectra.mgf file from the
job's sandbox instead of the one already available in TANDEM_LOCATION
[gridseed01@ui-1 exercise_2]$ diff xtandem_input.xml xtandem_input_SE.xml 18c18 < <note type="input" label="spectrum, path">TANDEM_LOCATION/bin/test_spectra.mgf</note> --- > <note type="input" label="spectrum, path">./spectra.mgf</note>
Modify xtandem_2.xrsl to stage the file xtandem_input_SE.xml as inputFile with the name input.xml
Modify xtandem_2.xrsl to stage the file test_spectra.mgf located at gsiftp://arc-ce.grid.seed:2811/repo/test_spectra.mgf as inputFile with the name spectra.mgf
Follow instructions on README to run the exercise
xtandem_3
In this exercise, a group of four !X!Tandem jobs will be submitted
from a single .xrsl file
This example shows how a group submission takes place. Submissions are done sequentially (one after another) but the brokering process is based on the information gathered at the beginning of the submission (this saves several ldapsearch thus reducing the submission time)
Follow instructions on README to run the exercise
Exercise_3: application integration
In this exercise a new application will be deployed on the Gridseed-ARC_CE. A new dedicated RTE (APPS/LIFE/RSPACE-0.81) will be defined and we will learn how to use it to execute the application.
Running rpsace application locally
Application is normally called with the following configuration: within the current folder there should be:
./: total 20 -rw-r--r-- 1 gridseed01 gridseed 857 Mar 27 2004 INPUT drwxr-xr-x 2 gridseed01 gridseed 4096 Mar 27 2004 potentials ./potentials: total 180 -rw-r--r-- 1 gridseed01 gridseed 79059 Mar 27 2004 H -rw-r--r-- 1 gridseed01 gridseed 94156 Mar 27 2004 C
- The application is then called as following:
rspace-0.81_i386-linux_SERIAL
Content of /APPS/LIFE/RSPACE-0.81 RTE
[root@arc-ce ~]# cat /export/software/APPS/LIFE/RSPACE-0.81
#!/bin/bash
# shared directory for application installation
application_base_path='/export/apps/rspace-0.81'
# version
rspace_version='0.81'
case "$1" in
0 )
# execution of RTE on the frontend prior submission of LRMS job
# Here site specific settings could be applied
# For example:
# SGE LRMS requires the setting of the Parallel Environment before submission to the LRMS:
# export joboption_nodeproperty_0=mpich
;;
1 )
# First execution of RTE on compute node - before starting execution of "executable"
as specified in submitted xrsl
export RSPACE_LOCATION=$application_base_path
export RSPACERUN="$RSPACE_LOCATION/rspace-0.81_i386-linux_SERIAL"
;;
2 )
# Second and final execution of RTE on compute node - at the end of the submitted LRMS job
# Clean up
;;
* )
# Now, calling argument is wrong or missing.
# If call was made from NorduGrid ARC, it is considered
# an error. If this script is to be used also to initialize
# MPI environment for local jobs in cluster, raising error here
# could be improper.
return 1
;;
esac
Guidelines for creating an RSPACE ARC job
- Input files are located at gsiftp://arc-ce.grid.seed:2811/repo/RSPACE/
- browse the content of the storage resource with
ngls
- browse the content of the storage resource with
- Modify
exercise_3.xrslto:- run RSPACE application
- Stage required input files from remote storage location
- Submit job
- ngsub exercise_3.xrsl
- Verify that the results are produced as expected
- Modify
exercise_3.xrslto:- Stage outputfiles in gsiftp://arc-ce.grid.seed:2811/repo/results/$LOGNAME/
- re-submit modified job
- ngsub exercise_3.xrsl
Debugging errors
Most of the time has most of the tools necessary for debugging errors and problem experienced when using the infrastructure; in this section we will learn how to debug and deal with common problem that could be experienced
proxy errors
When issuing any ng command, the validity of the grid proxy will be checked first; command should fail and report the exriration of the proxy
[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl The proxy has expired
or
[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl Could not determine location of a proxy certificate: globus_sysconfig: Could not find a valid proxy certificate file location/globus_sysconfig: Error with key filename/globus_sysconfig: File does not exist: /tmp/x509up_u1501 is not a valid file
submission errors
submission of a job may fail for several reasons
user's DN not in the grid-mapfile
[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl Job submission failed due to: All targets rejected job requests One or few jobs failed to be submitted
This can be sorted out together with the site administrator by inspecting the gridftpd.log file on the Gridseed-ARC_CE:
Dec 06 16:53:40 [29886] User subject: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed Dec 06 16:53:40 [29886] Encrypt: 1 Dec 06 16:53:40 Warning: there is no local mapping for user Dec 06 16:53:40 Proxy stored at /tmp/x509up_p29886.file63RJWe.1 Dec 06 16:53:40 Mapped to running user: root Dec 06 16:53:40 Mapped to local id: 0 Dec 06 16:53:40 Mapped to local group id: 0 Dec 06 16:53:40 Mapped to local group name: root Dec 06 16:53:40 Mapped user's home: /root Dec 06 16:53:40 Error: unknown (non-gridmap) user is not allowed Dec 06 16:53:40 [29886] User has no proper configuration associated Dec 06 16:53:40 [29886] response: 535 Not allowed\\ Dec 06 16:53:40 [29886] Accept exited
mapped user not allowed to access local resources
This can be sorted out together with the site administrator by inspecting the gridftpd.log file on the Gridseed-ARC_CE:
Dec 06 16:59:17 Have connections: 0, max: 100 Dec 06 16:59:17 New connection Dec 06 16:59:17 [30919] Accepted connection from 10.10.0.81:35765 Dec 06 16:59:17 [30919] response: 220 Server ready\\ Dec 06 16:59:17 [30919] User subject: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed Dec 06 16:59:17 [30919] Encrypt: 1 Dec 06 16:59:17 Proxy stored at /tmp/x509up_p30919.fileSAqMrb.1 Dec 06 16:59:17 Initially mapped to local user: gridseed01nothere Dec 06 16:59:17 Local user does not exist Dec 06 16:59:17 [30919] User has no proper configuration associated Dec 06 16:59:17 [30919] response: 535 Not allowed\\ Dec 06 16:59:17 [30919] Accept exited
---
-- Sergio - 2009-12-05

