Getting Started with ARC

Gridseed-ARC_UI command line interface

  • ngsub : find suitable resources and submit a job
  • ngstat : check the status of jobs and resources
  • ngcat : display stdout, stderr of a running job
  • ngget : retrieve the results of a finished job
  • ngkill : stop a job
  • ngclean : delete a job from a computing resource
  • ngsync : find user's jobs
  • ngls : list files on a storage resource or in job's sandbox
  • ngcp : transfer files to and from cluster and storages

The Batch system model

ARC can be viewed as an extended batch queue system. Jobs are submitted from a submission node (Gridseed-ARC_UI) to a server node that act sa as a resource manager (Gridsees-ARC_CE). Jobs are processed according the priority they have within the system queue. Local resources are selected and allocated for the execution of the job. Results are passed back either to the submission node or to an accessible storage resource.

Anatomy of an ARC job

  • job is described using extended Resource Specification Languafe (xrsl)
  • ngsub to submit the job
  • Runtime Environments used to access pre-installed applications
  • File transfer from the submission node or from an accessible storage resource to the Gridseed-ARC_CE
  • ARC transforms the grid job into a local batch system job
  • submission to the local batch system done and monitored by the ARC middleware
  • retrieval of results either back to the submission node or to an accessible storage resource.

Authentication

Checking your certificate

Your personal certificate is split in two separate files stored in a directory called .globus. We can check this by the following command:

[gridseed01@ui-1 ~]$ ls -lrt .globus/
total 12
-r-------- 1 gridseed02 gridseed 1751 Jul 30 15:21 userkey.pem
-rw-r--r-- 1 gridseed02 gridseed 3882 Jul 30 15:21 usercert.pem

As you can see in the .globus directory userkey.pem (the private part of the key) and usercert.pem (the public part of your key i.e. the certificate itself) are stored with the right permissions.

These two files are effectively your public and private keys, which will be used for the authenticated connections with all the other grid elements. It is essential that they have the correct file permissions otherwise you won't be able to create a proxy.

We can then take a look at the full certificate by issuing the following command:

[gridseed01@ui-1 ~]$ grid-cert-info 
Certificate:
 Data:
 Version: 3 (0x2)
 Serial Number: 25 (0x19)
 Signature Algorithm: sha1WithRSAEncryption
 Issuer: O=GRIDSEED, DC=seed, DC=grid, CN=GRIDSEED CA 1
 Validity
 Not Before: Dec 5 13:54:45 2009 GMT
 Not After : Dec 5 13:54:45 2010 GMT
 Subject: O=GRIDSEED, DC=seed, DC=grid, OU=Personal Certificate, CN=Gridseed01 Gridseed
 Subject Public Key Info:
 Public Key Algorithm: rsaEncryption
 RSA Public Key: (2048 bit)
 Modulus (2048 bit):
 00:c1:e8:63:ab:04:b1:1a:c7:92:19:22:92:df:34:
 35:df:52:49:09:71:1d:07:c3:2d:35:a9:ea:1a:7f:
 72:92:4c:4a:1b:16:e5:72:b4:3e:fe:e0:a2:12:ba:
 bf:c3:14:dc:c5:ee:64:56:ff:9f:67:58:fa:81:be:
 d4:72:6e:6a:00:67:ce:1b:b1:ed:55:86:f0:b3:af:
 fc:79:7b:43:d8:13:3e:5d:2b:35:c7:31:f0:fd:08:
 bd:5a:80:8c:cb:c1:69:65:99:d4:38:a2:ac:9f:c2:
 27:d4:13:41:61:5b:5c:45:cb:ca:69:37:63:aa:7a:
 45:53:18:0d:89:dd:de:f1:16:0f:a7:64:75:0e:c0:
 83:ac:02:62:23:36:36:b9:5a:e7:ee:7a:26:99:a7:
 40:48:52:eb:b6:52:94:69:1a:4b:9d:2f:0f:1a:a0:
 6b:2f:e2:bc:36:a0:c8:81:7e:7a:14:40:29:ac:4f:
 09:47:4f:09:53:5f:59:b3:ea:88:7a:22:a3:56:2f:
 55:2b:bf:75:57:9f:4a:3b:8c:f4:5a:ae:63:a4:0c:
 e2:c7:f7:d4:80:10:12:a4:0d:cc:80:ff:04:d4:9c:
 ba:76:93:8e:94:0e:1b:3a:71:21:ea:d6:12:0e:7b:
 e3:07:34:fa:b7:3d:9a:7e:17:2b:e8:08:cd:9f:0c:
 4e:d9
 Exponent: 65537 (0x10001)
 X509v3 extensions:
 X509v3 Basic Constraints: critical
 CA:FALSE
 X509v3 Key Usage: critical
 Digital Signature, Key Encipherment, Data Encipherment
 X509v3 Extended Key Usage: 
 TLS Web Client Authentication, E-mail Protection
 X509v3 CRL Distribution Points: 
 URI:http://ca.grid.seed/ca/ca.crl

 X509v3 Certificate Policies: 
 Policy: 1.3.6.1.4.1.10403.10.1.7
 Policy: 1.2.840.113612.5.2.2.1

 X509v3 Subject Key Identifier: 
 57:02:8E:B4:8A:37:D8:F1:EC:A0:BF:C9:D8:9F:3E:08:AD:91:13:79
 X509v3 Authority Key Identifier: 
 keyid:2B:12:26:16:31:57:2D:17:AE:A0:09:B0:3B:8F:0C:33:68:23:CC:E3

DirName:/O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
 serial:00

 X509v3 Subject Alternative Name: 
 email:pippo@nowhere
 Signature Algorithm: sha1WithRSAEncryption
 5e:e5:5d:1f:bd:c3:a8:05:ab:f5:b6:f4:a9:53:90:77:9d:49:
 19:c2:7c:83:6f:88:37:05:ea:f8:83:af:1d:08:d0:e4:ca:6a:
 28:f2:57:e4:a3:05:04:2b:4c:98:82:fe:a4:d0:52:52:92:fd:
 bd:61:28:74:50:e9:93:58:14:e2:25:70:92:ba:1e:a0:ad:ae:
 bf:a2:94:3a:56:d7:4a:e5:b5:de:21:3b:06:f7:d0:94:a8:4f:
 64:1e:0b:20:3d:d5:3a:bb:69:31:06:0e:7e:30:ff:b5:bd:12:
 e1:be:e8:dd:24:05:13:6f:ea:1e:bc:42:8d:61:84:ac:51:65:
 d4:64:05:5d:9e:af:7f:8e:b3:af:33:d6:ff:d2:a2:ad:d2:51:
 3d:38:3d:ae:cb:53:69:19:5a:97:43:da:79:c8:87:04:0a:3f:
 f8:40:45:52:75:fd:4c:f1:3a:64:ec:ec:b6:fe:54:3e:00:33:
 15:ec:41:ba:0c:0e:f6:08:9c:55:8d:d1:1b:35:59:35:f9:23:
 92:10:22:4f:e4:37:3c:ca:63:e1:84:ea:8b:63:1f:cd:0d:9b:
 ea:60:b7:cb:21:27:93:19:b5:a8:7c:da:b7:b2:c4:27:54:4a:
 e3:fc:d7:3b:f8:86:a7:c7:c3:ae:e2:8f:03:47:23:43:53:2a:
 b1:4b:d4:4f

There are three interesting things to check here:

  • dates of creation and of expiration
  • the name and subject of the Certification Authority which issued the certificate,
  • the Common Name (CN) of the certificate owner, and the certificate subject, which uniquely identifies the certificate owner.

Creation of a grid proxy

We are now ready to create a proxy from our certificate: by means of it we can now start interacting with the grid. Please remember that the __passphrase__ requested is: gridseed

[gridseed01@ui-1 ~]$ grid-proxy-init 
Your identity: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
Enter GRID pass phrase for this identity:
Creating proxy ................................. Done
Your proxy is valid until: Sun Dec 6 02:56:53 2009

Check your grid proxy

Once that your proxy has been created, you can gather info on it through the grid-proxy-info command.

[gridseed01@ui-1 ~]$ grid-proxy-info 
subject : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed/CN=279397512
issuer : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
identity : /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
type : RFC 3820 compliant impersonation proxy
strength : 512 bits
path : /tmp/x509up_u1501
timeleft : 11:59:04

Alternatively one can use the ngtest command with the -E, -certificate option to print info about installed user- and CA-certificates

[gridseed01@ui-1 ~]$ ngtest -E
Certificate information:

Certificate: /home/gridseed01/.globus/usercert.pem
Subject name: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
Valid until: Sun Dec 5 14:54:45 2010

Proxy: /tmp/x509up_u1501
Proxy-subject: /O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed/CN=279397512
Valid for: 11 hours, 58 minutes, 54 seconds

Certificate issuer: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1

CA-certificates installed:
/O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
/O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1

Browsing the Information System

The ARC information system is an OpenLdap-based system which is derived from the Globus Monitoring and Discovery Services framework. It uses BDII together with a updated MDS LDAP Schema. Nordugrid computing resources publish information about their available resources using LDAP servers. Contact information to these LDAP servers is collected into the Nordugrid Information System.

ngstat command

The ngstat command is used for obtaining the status of jobs that have been submitted with ARC; with the -q option could be used to retrieve information about the status of a cluster resource.

USAGE: ngstat [options] [job ...]

Options:
 -a, -all all jobs
 -i, -joblist filename file containing a list of jobids
 -c, -cluster [-]name explicitly select or reject a specific cluster
 -C, -clustlist [-]filename list of clusters to select or reject
 -s, -status statusstr only select jobs whose status is statusstr
 -g, -giisurl url url to a GIIS
 -G, -giislist filename list of GIIS urls
 -q, -queues show information about clusters and queues
 -l, -long long format (more information)
 -t, -timeout time timeout in seconds (default 20)
 -d, -debug debuglevel from -3 (quiet) to 3 (verbose) - default 0
 -x, -anonymous use anonymous bind for MDS queries (default)
 -X, -gsi use gsi-gssapi bind for MDS queries
 -v, -version print version information
 -h, -help print this help

Getting information on a cluster resource

Use ngstat to obtain information about a given resource (by default all ARC command available on gridseed-ARC_UI will point to gridseed-ARC_CE)

[gridseed01@ui-1 ~]$ ngstat -q -l
Cluster arc-ce.grid.seed
 Alias: GridSeed ARC test cluster
 Contact: gsiftp://arc-ce.grid.seed:2811/jobs
 Host Certificate Issuer: /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
 Owner:
 Democritos
 Architecture: i686
 Operating System:
 Scientific Linux SL release 4.7 (Beryllium)
 Size of Scratch Directory: 7760510976 bytes
 Free Space in Scratch Directory: 5606735872 bytes
 Size of Cache Directory: 49999249408 bytes
 Free Space in Cache Directory: 49999249408 bytes
 Number of CPUs: 8
 Number of Used CPUs: 0
 Number of Jobs: 0
 Number of Computers:
 8 processors: 1
 Cluster Support:
 root@localhost
 Installed Middleware:
 nordugrid-arc-0.6.3
 globus-4.0.8-0.12.el4ng
 Installed Runtime Environments:
 APPS/LIFE/TANDEM-09-04-01-1
 Network Access on Cluster Nodes:
 outbound
 Session directory lifetime: 3 days
 Trusted Certificate Authorities:
 /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
 /O=GRIDSEED/DC=seed/DC=grid/CN=GRIDSEED CA 1
 Entry valid from: 2009-12-05 15:13:12
 Entry valid to: 2009-12-05 15:13:42
Queue default
 Status: active
 Number of Running Jobs: 0
 Max CPU Time: 2 days
 Max Wall Time: 3 days
 Scheduling Policy: FIFO
 Number of CPUs: 8
 Entry valid from: 2009-12-05 15:13:12
 Entry valid to: 2009-12-05 15:13:42

Using ldapsearch

The ARC information system could be queried using ldapsearch:

 [gridseed01@ui-1 ~]$ ldapsearch -x -h arc-ce.grid.seed -p 2135 -b ' Mds-Vo-name=local,o=grid' 

This command prints all the information published by the ARC information system following the Nordugrid/ARC LDAP schema.

Filter information from ldapsearch: list available applications

Display information about available application tags from a given computing resource

[gridseed01@ui-1 ~]$ ldapsearch -x -h arc-ce.grid.seed -p 2135 -b ' Mds-Vo-name=local,o=grid' \
| grep nordugrid-cluster-runtimeenvironment 
nordugrid-cluster-runtimeenvironment: APPS/LIFE/TANDEM-09-04-01-1

Exercises

Introduction

Basic job workflow:

  • Obtain access to a User Interface (Gridseed-ARC_UI)
  • Request a user certificate from a Certification Authority
  • Deploy the signed certificate on the User Interface
  • Create grid proxy
  • Write a job description
  • Submit job
  • Monitor the progress of the job
  • Fetch the results

In your $HOME on the Gridseed-ARC_UI you will find the following folders:

[gridseed01@ui-1 ~]$ ls 
results  tutorial_exercises

results will contains the output of the examples you will run through this tutorial

tutorial_exercises contains relevant material for the exercises (xrsl files, scripts, input files); it is organized in three groups of exercises:

  • Exercise 1: learn the basic usage of the infrastructure
  • Exercise 2: learn how to use RTEs and storage resources by running X!Tandem application
  • Exercise 3: learn how a newly application is integrated in the infrastructure and how to run it

Exercise 1: basic usage

Enter folder tutorial_exercises/exercise_1

[gridseed01@ui-1 ~]$ cd tutorial_exercises/exercise_1/

File *README* explains in details how to run each individual exercises.

[gridseed01@ui-1 exercise_1]$ cat README  

Exercise_1_1

The very simple exercise_1_1.xrsl shows the basic components of a job descritption:

(executable=/bin/env)

The executable to be submitted as a main task to a Local Resource Management System (LRMS). Executable is a file that has to be executed as the main process of the task. It could be either a precompiled binary, or a script. Users may transfer their own executables, or use the ones known to be already installed on the remote system (CE).

(stdout=std.out) 
(stderr=std.err)

The standard output and standard error files. If not listed as part of the !outputFiles, they will be staged back to the User Interface once job will be retrieved.

(gmlog=gridlog) 

A name of the directory containing grid-specific diagnostics per job. This directory is kept in the session directory to be available for retrieval.

(jobname="Exercise_1_1")

User-specified job name. This name is meant for convenience of the user. It can be used to select the job while using the Gridseed-ARC_UI. It is also available through the Information System.

(cputime=10) 
(disk=1) 

Example of resource specifications. By these attributes, the resource broker will try to match the job requirements with available computing and storage resources. These parameters affects the match making process for the resource selection which determines where the job will be executed.

Running exercise_1_1

  • Check validity of proxy
grid-proxy-info
  • Submit job
ngsub -f exercise_1_1.xrsl

If a job is successfully submitted, a job identifier (job ID ) is printed to standard output. The job ID uniquely identifies the job while it is being executed. A typical job ID looks like follows:

gsiftp://arc-ce.grid.seed:2811/jobs/10308913211503407485

You should use this as a handle to refer to the job when doing other job manipulations, such as querying job status (ngstat), killing it (ngkill) or retrieving the result (ngget).

  • Check status of submitted job
  • In order to know about the job status another command is available: ngstat. A job can be referred to either by the jobID that was returned by ngsub at submission time, or by its name if the job description specified a job name.
ngstat <jobid>
ngstat Exercise_1_1
  • List content of job's sandbox
ngls <jobid>
  • Check standard out of submitted job
ngcat <jobid>
  • Check standard err of submitted job
ngcat -e <jobid> 
  • Check internal the grid error log of the job
ngcat -l <jobid>
  • Once job status is reported as *FINISHED*
  • Retrieve job result
ngget <jobid> -j
ngget Exercise_1_1 -j
  • Results are retrieved on from the Gridseed-ARC_CE to the Gridseed-ARC_UI in your _$HOME/results_ folder
  • The files to be retrieved by ngget should be described by the xRSL as follow:
(outputfiles=(file1 "")(file2 ""))
  • By default, ngget will create in your $HOME/results a new folder, by the same name as the remote session directory (typically, a numerical string). This new directory will contain all the files listed for download in your job description. If you would like to store the files in another location, use the -dir option. The option -j will assign your job name to the local directory with the output files (be careful not to call all the jobs same name when using this option).

Exercise_1_2

Modify exercise_1_2.xrsl to print out hostname instead of /bin/env

Follow instructions on README to run the exercise

Exercise_1_3

This is an example or running an executable and passing some arguments:

& (executable=/bin/echo)
(arguments="Hello World")

will result in the execution of

/bin/echo "Hello World"  

Follow instructions on README to run the exercise

Exercise_1_4

Modify exercise_1_4.xrsl to print kernel information to std.our:

uname -a

Follow instructions on README to run the exercise

Exercise_1_5

This is an example of running a bash script that it is passed to the Gridseed-ARC_CE as part of the job submission

Check run_tst.sh

#!/bin/bash

echo "[`date`] Start"

env

echo "---------------"
echo "Reading content of current folder [$PWD]"
ls -la

# Exercise 1_6
# Uncomment the following part
# echo "---------------"
# echo "Listinf content of archive " 
# if [ -s archive.tgz ]; then   
#   tar tfz archive.tgz
# else
#   echo "File archive.tgz Not found"
#   exit 1
# fi

# Exercise 1_7
# Uncomment the following part
# echo "---------------" > output_list.txt
# echo "Listinf content of archive "  >> output_list.txt
# if [ -s archive.tgz ]; then    
#   tar tfz archive.tgz >> output_list.txt
# else
#   echo "File archive.tgz Not found" >> output_list.txt
#   exit 1
# fi

# Exercise 1_8
# Uncomment the following part 
# sleep 180

echo "[`date`] Done"

Follow instructions on README to run the exercise

ngls Exercise_1_5 lists the content of the job's sandbox on the Gridseed-ARC_CE

ngls should list also the file run_test.sh that has been transferred from the Gridseed-ARC_UI

check also with ngcat -l the file transfer operation:

Dec 06 11:30:35 Downloader started
Dec 06 11:30:35 Check user uploadable file: /run_test.sh
Dec 06 11:30:35 User has uploaded file 

Exercise_1_6

In this exercise we show the use of the inputFiles directive

(inputFiles=(archive.tgz "gsiftp://arc-ce.grid.seed/repo/exercise_2/archive.tgz"))  
Format: (inputFiles=(<filename> <location>) ... )

List of files to be copied to the Gridseed-ARC_CE before the execution, where:

<filename>

is a File name, local to the Gridseed-ARC_CE and always relative to the session directory

<location> 

is the location of the file (gsiftp, https, ftp, http URLs, or a path, local to the submission node Gridseed-ARC_UI). If void ("", use the quotes!), the input file is taken from the submission directory.

In this example

gsiftp://arc-ce.grid.seed/repo

is the storage resource where the files have been pre-staged (it could be any remote storage resource)

Modify run_test.sh to list the content of the tar archived names "archive.tgz" that will be passed as input file in the job description (uncomment the Exercise_1_6 part in the run_test.sh script)

Follow instructions on README to run the exercise

Exercise_1_7

In this exercise we show the use of the outputFiles directive

(outputFiles=(output_list.txt ""))
Format: (outputFiles=(<string> <URL>) ... )

List of files to be retrieved once job is finished

string

File name, local to the Gridseed-ARC_CE

URL

URL of the remote file; if void ("", use the quotes!), the file is retrieved on Gridseed-ARC_UI when ngget is issued.

In this example the file output_list.txt will be retrieved on Gridseed-ARC_UI.

Modify run_test.sh to output the list of the tar archive named archive.tgz to a file named output_list.txt (uncomment the Exercise_1_7 part in the run_test.sh script)

Follow instructions on README to run the exercise

Exercise_1_8

This example shows how to kill a submitted job using ngkill. A job can be killed almost on any stage of processing through the Grid.

ngkill [options] [job ...] 
Options: 
-a, -all all jobs 
-i, -joblist filename file containing a list of jobIDs 
-c, -clusters show information about clusters 
-C, -clustlist [-]text filename list of sites (clusters) to select or reject 
-s, -status statusstr only select jobs whose status is statusstr 
-keep keep files on gatekeeper (do not clean) 
-t, -timeout time timeout for queries (default 40 sec) 
-d, -debug debuglevel debug level, from -3 (quiet) to 3 (verbose) - default 
0 
-x, -anonymous use anonymous bind for queries (default) 
-X, -gsi use GSI-GSSAPI bind for queries 
-v, -version print version information 
-h, -help print help page 
Arguments: 
job ... list of job IDs and/or jobnames

Modify run_test.sh to introduce a 3' sleep (uncomment the Exercise_1_7 part in the run_test.sh script)

Follow instructions on README to run the exercise

Exercise 2: Advanced use of the infrastructure (RTE, SE and group submit)

Runtime Environments provide a means to make software packages installed at the systems available on the Grid. Users can specify in the job description file that a specific runtime environment needs to be present in the target system. That avoids the need to send the actual application binary as part of the computation: only input files need to be sent.

In this group of exercises !X!Tandem application will be used

Enter folder tutorial_exercises/exercise_2

[gridseed01@ui-1 ~]$ cd tutorial_exercises/exercise_2/

File *README* explains in details how to run each individual exercises.

[gridseed01@ui-1 exercise_1]$ cat README 

xtandem_1

In this exercise a simple and pre-configured !X!Tandem job will be submitted.

We will use input files directly available on the Gridseed-ARC_UI and the pre-defined TANDEM-09-04-01-1 RTE

list the content of xtandem_1.xrsl

[gridseed01@ui-1 exercise_2]$ cat xtandem_1.xrsl 
&(executable="xtandem_run.sh")
(inputFiles=(input.xml "xtandem_input.xml"))
(outputFiles=(output.xml ""))
(stdout="std.out")
(stderr="std.err")
(gmlog=gridlog) 
(jobname="XTANDEM_sample")
(runtimeEnvironment=APPS/LIFE/TANDEM-09-04-01-1)

The directive !runtimeEnvironment

Format:(runTimeEnvironment=<string>)
string environment name

The site to submit the job to will be chosen by the Gridseed-ARC_UI among those advertising specified runtime environments. Before starting the job, the Gridseed-ARC_CE will set up environment variables and paths according to those requested.

In this example

APPS/LIFE/TANDEM-09-04-01-1

will be requested

On the Gridseed-ARC_CE the RTE is deployed in _$runtimedir_ folder and is a bash script that will be executed three times during the lifetime of a submitted job

[root@arc-ce ~]# cat /export/software/APPS/LIFE/TANDEM-09-04-01-1 
#!/bin/bash

# shared directory for application installation
application_base_path='/export/apps/tandem-linux-09-04-01-1'
# version
tandem_version='09-04-01-1'

case "$1" in
0 )
    # execution of RTE on the frontend prior submission of LRMS job
    # Here site specific settings could be applied
    # For example:
    # SGE LRMS requires the setting of the Parallel Environment before submission to the LRMS:
    # export joboption_nodeproperty_0=mpich
;;

1 )
    # First execution of RTE on compute node - before starting execution of "executable" as specified
    in submitted xrsl

    export TANDEM_LOCATION=$application_base_path
    export TANDEM_TAXONOMY=$application_base_path/bin
    export PATH=$TANDEM_LOCATION/bin:$PATH
    export TANDEMRUN="$TANDEM_LOCATION/bin/tandem.exe"

;;

2 )
    # Second and final execution of RTE on compute node - at the end of the submitted LRMS job
    # Clean up
;;


* )
    # Now, calling argument is wrong or missing.
    # If call was made from NorduGrid ARC, it is considered
    # an error. If this script is to be used also to initialize
    # MPI environment for local jobs in cluster, raising error here
    # could be improper.
    return 1
;;
esac

Follow instructions on README to run the exercise

The command

ngcat -l xtandem_1

will show the details on when the requested RTE will be processed

...
if [ ! -z "$RUNTIME_CONFIG_DIR" ] ; then
if [ -r "${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1" ] ; then
    runtimeenvironments="${runtimeenvironments}APPS/LIFE/TANDEM-09-04-01-1;"
    source ${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1 1 
...
if [ ! -z "$RUNTIME_CONFIG_DIR" ] ; then
  if [ -r "${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1" ] ; then
    source ${RUNTIME_CONFIG_DIR}/APPS/LIFE/TANDEM-09-04-01-1 2 
...

xtandem_2

In this exercise the input file test_spectra.mgf will be staged from a storage resource directly to the Gridseed-ARC_CE during the submission process of the job. We will use a different input file input.xml that will reference the spectra.mgf file from the job's sandbox instead of the one already available in TANDEM_LOCATION

[gridseed01@ui-1 exercise_2]$ diff xtandem_input.xml  xtandem_input_SE.xml 
18c18
<         <note type="input" label="spectrum, path">TANDEM_LOCATION/bin/test_spectra.mgf</note>
---
>         <note type="input" label="spectrum, path">./spectra.mgf</note>

Modify xtandem_2.xrsl to stage the file xtandem_input_SE.xml as inputFile with the name input.xml

Modify xtandem_2.xrsl to stage the file test_spectra.mgf located at gsiftp://arc-ce.grid.seed:2811/repo/test_spectra.mgf as inputFile with the name spectra.mgf

Follow instructions on README to run the exercise

xtandem_3

In this exercise, a group of four !X!Tandem jobs will be submitted from a single .xrsl file

This example shows how a group submission takes place. Submissions are done sequentially (one after another) but the brokering process is based on the information gathered at the beginning of the submission (this saves several ldapsearch thus reducing the submission time)

Follow instructions on README to run the exercise

Exercise_3: application integration

In this exercise a new application will be deployed on the Gridseed-ARC_CE. A new dedicated RTE (APPS/LIFE/RSPACE-0.81) will be defined and we will learn how to use it to execute the application.

Running rpsace application locally

Application is normally called with the following configuration: within the current folder there should be:

./:
total 20
-rw-r--r--  1 gridseed01 gridseed  857 Mar 27  2004 INPUT
drwxr-xr-x  2 gridseed01 gridseed 4096 Mar 27  2004 potentials

./potentials:
total 180
-rw-r--r--  1 gridseed01 gridseed 79059 Mar 27  2004 H
-rw-r--r--  1 gridseed01 gridseed 94156 Mar 27  2004 C
  • The application is then called as following:
rspace-0.81_i386-linux_SERIAL 

Content of /APPS/LIFE/RSPACE-0.81 RTE

[root@arc-ce ~]# cat /export/software/APPS/LIFE/RSPACE-0.81 
#!/bin/bash

# shared directory for application installation
application_base_path='/export/apps/rspace-0.81'
# version
rspace_version='0.81'

case "$1" in
0 )
    # execution of RTE on the frontend prior submission of LRMS job
    # Here site specific settings could be applied
    # For example:
    # SGE LRMS requires the setting of the Parallel Environment before submission to the LRMS:
    # export joboption_nodeproperty_0=mpich
;;

1 )
    # First execution of RTE on compute node - before starting execution of "executable"
    as specified in submitted xrsl

    export RSPACE_LOCATION=$application_base_path
    export RSPACERUN="$RSPACE_LOCATION/rspace-0.81_i386-linux_SERIAL"

;;

2 )
    # Second and final execution of RTE on compute node - at the end of the submitted LRMS job
    # Clean up
;;


* )
    # Now, calling argument is wrong or missing.
    # If call was made from NorduGrid ARC, it is considered
    # an error. If this script is to be used also to initialize
    # MPI environment for local jobs in cluster, raising error here
    # could be improper.
    return 1
;;
esac

Guidelines for creating an RSPACE ARC job

  • Input files are located at gsiftp://arc-ce.grid.seed:2811/repo/RSPACE/
    • browse the content of the storage resource with ngls
  • Modify exercise_3.xrsl to:
    • run RSPACE application
    • Stage required input files from remote storage location
  • Submit job
    • ngsub exercise_3.xrsl
  • Verify that the results are produced as expected
  • Modify exercise_3.xrsl to:
    • Stage outputfiles in gsiftp://arc-ce.grid.seed:2811/repo/results/$LOGNAME/
  • re-submit modified job
    • ngsub exercise_3.xrsl

Debugging errors

Most of the time has most of the tools necessary for debugging errors and problem experienced when using the infrastructure; in this section we will learn how to debug and deal with common problem that could be experienced

proxy errors

When issuing any ng command, the validity of the grid proxy will be checked first; command should fail and report the exriration of the proxy

[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl 
The proxy has expired

or

[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl 
Could not determine location of a proxy certificate: globus_sysconfig:
Could not find a valid proxy certificate file
location/globus_sysconfig: Error with key filename/globus_sysconfig:
File does not exist: /tmp/x509up_u1501 is not a valid file

submission errors

submission of a job may fail for several reasons

user's DN not in the grid-mapfile

[gridseed01@ui-1 exercise_2]$ ngsub xtandem_1.xrsl 
Job submission failed due to: All targets rejected job requests
One or few jobs failed to be submitted

This can be sorted out together with the site administrator by inspecting the gridftpd.log file on the Gridseed-ARC_CE:

Dec 06 16:53:40 [29886] User subject: 
/O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
Dec 06 16:53:40 [29886] Encrypt: 1
Dec 06 16:53:40 Warning: there is no local mapping for user
Dec 06 16:53:40 Proxy stored at /tmp/x509up_p29886.file63RJWe.1
Dec 06 16:53:40 Mapped to running user: root
Dec 06 16:53:40 Mapped to local id: 0
Dec 06 16:53:40 Mapped to local group id: 0
Dec 06 16:53:40 Mapped to local group name: root
Dec 06 16:53:40 Mapped user's home: /root
Dec 06 16:53:40 Error: unknown (non-gridmap) user is not allowed
Dec 06 16:53:40 [29886] User has no proper configuration associated
Dec 06 16:53:40 [29886] response: 535 Not allowed\\
Dec 06 16:53:40 [29886] Accept exited

mapped user not allowed to access local resources

This can be sorted out together with the site administrator by inspecting the gridftpd.log file on the Gridseed-ARC_CE:

Dec 06 16:59:17 Have connections: 0, max: 100
Dec 06 16:59:17 New connection
Dec 06 16:59:17 [30919] Accepted connection from 10.10.0.81:35765
Dec 06 16:59:17 [30919] response: 220 Server ready\\
Dec 06 16:59:17 [30919] User subject: 
/O=GRIDSEED/DC=seed/DC=grid/OU=Personal Certificate/CN=Gridseed01 Gridseed
Dec 06 16:59:17 [30919] Encrypt: 1
Dec 06 16:59:17 Proxy stored at /tmp/x509up_p30919.fileSAqMrb.1
Dec 06 16:59:17 Initially mapped to local user: gridseed01nothere
Dec 06 16:59:17 Local user does not exist
Dec 06 16:59:17 [30919] User has no proper configuration associated
Dec 06 16:59:17 [30919] response: 535 Not allowed\\
Dec 06 16:59:17 [30919] Accept exited

---

-- Sergio - 2009-12-05