Brief notes on how to submit jobs to a Condor pool using GridSAM
Introduction
These notes are intended to supplement the
GridSAM documentation. The best place to start is the Quick Start Guide, following on from the sections on
GridSAM installation. This doesn't cover the use of Condor, which is dealt with in the Deployment Guide. I didn't test anything in the "Advance Job Submission" section of the Quick Start Guide, which covers transferring input and output files via FTP or HTTP.
Job descriptions
The Job Submission Description Language (JSDL) document that I used to test
GridSAM is shown in the attachment Test1.xml.
The JSDL specification can be downloaded from the project Web site (
https://forge.gridforum.org/projects/jsdl-wg/document/draft-ggf-jsdl-spec/en/28). Note that the
WorkingDirectory line in the JSDL document is commented out, because I did not manage to get this feature to work. That is why full paths are given for all the files involved in the job. The executable file, test, is a C program that writes out all the character strings that it finds in the command line arguments and in the standard input stream. Test1.xml specifies that file test1.in is sent to the standard input stream, and that the standard output stream is written to test1.out.
Submitting Jobs to a Condor Pool
The default launching mechanism for
GridSAM is a the fork, where the job is simply launched as a process on the machine where the job was submitted. To change the launch mechanism to Condor, the file jobmanager.xml must be changed. This configuration file can be found in the following subdirectory of the OMII Server home directory:
jakarta-tomcat-5.0.25/webapps/gridsam/WEB-INF/classes
Sample files for all the launching mechanisms supported by
GridSAM can also be found in this directory. The original version of the sample jobmanager file for Condor, jobmanager-condor.xml, has an important omission, a line specifying the path to the spooler directory used by
GridSAM during the execution of the job. The spooler directory is referred to in the Condor section of the
GridSAM Deployment guide (
http://gridsam.sourceforge.net/1.1/deploymentguide/condor.html), but the appropriate line is omitted from the sample jobmanager file. The jobmanager file I used to test
GridSAM with Condor is given in the attachment jobmanager-condor.xml, which has the spooler directory line added. To use this file it must be copied to a file named jobmanager.xml.
Note that each time jobmanager.xml is changed, the OMII server must be restarted by running the stopomii.sh and startomii.sh scripts. To submit a job to a remote Condor pool, an extra section relating to ssh must be added to jobmanager.xml, as in the attachment jobmanager-condor_ssh.xml.
Note that all of the paths (except the path to the private ssh key file at the bottom) now refer to the remote Condor submit host, which is called Machine C in the diagrams in the
GridSAM documentation. Before these tests were carried out, my public ssh key had already been added to ~/.ssh/authorized_keys on the remote Condor submit machine, enabling me (or my
GridSAM processes) to log onto that machine without entering a password.
I should admit that this remote Condor submission test didn't quite work. The gridsam-status command showed an error relating to the transfer of the executable file to the remote machine, so the job was never actually submitted to the remote Condor pool. I did manage to run a job on the remote machine using ssh only, by using the jobmanager file in attachment jobmanager-ssh.xml, which has the same ssh settings as jobmanager-condor_ssh.xml.
--
DanBretherton - 24 Feb 2006