This tutorial will setup a SMP parallel execution on GRIDSEED resources using the reserve_smp_node tool.
The old page using gsiftp for the data management can be found | here

Author: Riccardo di Meo

- Download the scripts:

 http://www.escience-lab.org/software/dimeo/cost_school/simple_espresso/pw_simple_lcg.tar.gz

and untar them somewhere: inside there are 2 different directories, since the tutorial consists in two different steps.

step 1: the thread example

- Enter the pw_threads directory.

- Download the following files:

 http://www.escience-lab.org/software/dimeo/cost_school/simple_espresso/bin.tar.gz
 http://www.escience-lab.org/software/dimeo/cost_school/simple_espresso/pseudopot.tar.gz
 http://www.escience-lab.org/software/dimeo/cost_school/simple_espresso/input.tar.gz

- upload them on an SE using globus-url-copy or uberftp (keep in mind you are working in a shared environment, therefore create one or more directory to put your files in).

- The locations of the files are hard coded in 2 places:

  1. run_example_threads.py, at lines 41, 44 and 47 (for this last position you just need to specify a directory created by you where some logs will be dumped).
  2. run_example.jdl, in the Arguments field there are 2 values, the first one is the location of he input.tar.gz package, and the second is the location where you would like the output of the espresso code, as a tar.gz archive (both locations should be lfn urls).

- Both examples (the current one, with theads and the mpishmem one, later) can be executed first like normal serial jobs: run the 'thread' example using the simple jdl provided (suggestion: use the command glite-wms-job-submit)

- Collect the results using uberftp (the 2th argument in the JDL) and the logs using glite-wms-job-get-output (though present on the SE as well, in the location specified in run_example_threads.py at line 47) and examine them (paying particular attention to the profiling data).

- Now examine the grid logs using glite-wms-job-logging-info to get an idea about the overhead associated with the grid

- You are ready to run the code with reserve_smp_nodes: download the package from:

 http://www.escience-lab.org/software/dimeo/reserve_smp_nodes-1.5.tar.bz2

and untar it.

- Enter the directory reserve_smp_nodes-1.5 and create a file named tasks.txt containing the following 2 lines:

!; 
2;<path to the run_example_threads.py;<lfn url for the input> <lfn url for the output>

Keep in mind that the first line is exactly 3 characters: exclamation mark, semicolon and one space.

The path and arguments in the second line should not contain spaces.

E.g.:

!; 
2;../pw_threads/run_example_threads.py;lfn:/grid/gridseed/user/input.tar.gz
lfn:/grid/gridseed/user/output_reserve.tar.gz

where the 3th ;-separated field (the script arguments) has the same meaning of the Arguments field in the jdl (it's not a coincidence).

Note also that the 2th field, the path to the binary, should be corrected to point to the location of the run_example_threads.py in you setup!!

- run the command:

./reserve_smp_nodes -r ce-1.grid.seed:2119/jobmanager-lcgpbs-gridseed -v gridseed -j 4 -J tasks.txt \
-T 500

which will submit 4 jobs to the ce-1 virtual cluster (ce-2 can be used as well), for the gridseed Virtual Organization and will wait at most 500 seconds for 2 jobs to end on the same virtual Worker Node.

If enough resources are present, the reserve_smp_nodes program should terminate with:

All tasks have been assigned!

- monitor the status of the jobs on the grid by either periodically searching the SE for the output, or using the glite-wms-job-status command on the jobs_id.txt file (keep in mind that it contains all the IDs of the job you have submitted, not just the ones running!)

- Collect the output (as done before for the serial run) and the logs (this time they are present only on the SE! glite-wms-job-output will not work!) and compare the performances with the explicit jdl submission (single threaded).

step2: The MPI example

Very similar to the first one, although using the MPI support in the QE binaries and requiring therefore some extra preparation.

- Enter the directory pw_mpishmem

- Download the following files:

 http://www.escience-lab.org/software/dimeo/cost_school/simple_espresso/bin_shmem.tar.gz
 http://www.escience-lab.org/software/dimeo/relocatable_mpich_shmem.tar.gz

then upload them on a SE using globus-url-copy or uberftp (you can use the same location used for the threaded example).

- Change run_example_mpishmem.py to point it to the right locations (lines 26, 27, 32, 33, 38 and 43): keep in mind that you are about to re-use the pseudopotentials and input of the last example.

- As before, change also the Argument field in the jdl (the first value should be equal to the one in the last jdl) and submit it (thus in a serial fashion) to both check that everything is in order and have a base reference for the performances.

- Same as before: collect the results using uberftp and the logs using glite-wms-job-get-output and examine them (in particular, confront them with the logs and output of the jdl submission in the thread section and see if the time spent for the single processor version of the code is the same spent by the MPI one using one processor).

- You have the reserve SMP nodes package already, therefore you don't need to download it: just run the code in the same way you did for the threads example: just change the path to the script and the output location.

- Now that you have both results, from MPI shmem and automatic thread optimization, confront them and find out the differences.