Configuration of Styx Grid Services

In order to run a Styx Grid Services server, you will need to create a configuration file in XML. This section describes the format of this XML file and gives some examples of how to set up Styx Grid Services.

Overall structure

The overall structure of the XML file is quite simple. If you are not familiar with XML, don't worry. XML files are just text files with a defined structure. Important bits of information are placed between tags like so: <name>Joe Bloggs</name>. If this reminds you of HTML, there's a good reason for this. Modern, well-structured HTML (known as XHTML) is actually a "flavour" of XML.

The configuration file is described by a Document Type Definition (DTD). The DTD specification for the SGS configuration file is found in conf/SGSconfig.dtd. You don't need to worry about this: it is used internally by the SGS software to make sure that the configuration file is valid. If you create an invalid configuration file, this will be detected when you try to run the SGS server and an error message will appear.

The large-scale structure of the configuration file looks like this:

    <sgs>
    
      <server address="sgs.myserver.com" port="9092" cacheLocation="C:\StyxGridServices">
         ...
      </server>
      
      <gridservices>
        <gridservice name="mysgs" ... ></gridservice>
        <gridservice name="anotherSGS" ...></gridservice>
        ...
      </gridservices>
      
    </sgs>
    

Everything is contained between <sgs> and </sgs> tags. The information between the <server> tags specifies the server settings. The <server> tag itself has three possible attributes:

AttributePossible valuesDefault valuePurpose
addressHostname or IP addressAuto-detectedThis atribute is used to specify the address (hostname or IP address) of the server from the point of view of clients (i.e. the public address). It is an optional attribute: if it is omitted or left blank, the system will attempt to detect the server's IP address using Java's InetAddress.getLocalHost().getHostAddress() method.
portInteger between 256 and 65535 inclusive9092This atribute is used to set the port on which the server will listen. The port number must not be in use by any other process and the user running the server must have permission to use this port (on many systems including Unix, only the root user is allowed to use ports with numbers less than 1024). If this attribute is omitted or left blank, port 9092 will be used by default.
cacheLocationValid directory location$HOME/StyxGridServicesThe value of this attribute is the directory on the server that will be used to store information about all the services. This directory is used for cached files, state data and other things. This directory will be created when the server starts if it does not already exist. You (i.e. the user running the server process) must have write permissions in this directory. If this attribute is omitted or left blank, the system will use or create a directory called StyxGridServices in the user's home directory. (The user's home directory is found using Java's user.home system property.)
Note that the <server> tag can be omitted from the config file altogether. In this case, default values will be chosen for all attributes and the server will be unsecured.

The contents of the <gridservices> tag are explained in the following sections.

Configuration of a Styx Grid Service

The <gridservices> tag is a container for all the <gridservice> tags. There is one <gridservice> tag for each Styx Grid Service that the server exposes. This tag contains all the information about the executable that the SGS is wrapping: the path to the executable, the command-line parameters that it expects, the input files it consumes, the output files it creates, plus some other things. The structure of the <gridservice> tag and its sub-tags looks like this:

    <gridservice name="mysgs" command="C:\path\to\executable"
        description="A Styx Grid Service">
      <params>
        ...
      </params>
      <inputs>
        ...
      </inputs>
      <outputs>
        ...
      </outputs>
      <serviceData>
        ...
      </serviceData>
      <steering>
        ...
      </steering>
      <docs>
        ...
      </docs>
    </gridservice>
    

The <gridservice> tag itself has three attributes. The name attribute gives a short name for the SGS that will be used to identify it. This name must be different from the names of all the other SGSs on this server. This name cannot contain spaces. The command attribute specifies the full path to the executable that will be run. A short, one-sentence description of the SGS can be placed in the optional description tag.

The sub-tags (children) of the <gridservice> tag specify different aspects of the Styx Grid Service. Most SGSs will only require a few of these tags to be used, as we shall see. We shall now go through each of these tags in turn and describe how to use them.

Parameters

Parameters are values that are set before an SGS is run. In the current system the parameters translate directly into the command-line arguments for the underlying executable. The parameters are specified between the <params> tags. This is perhaps the most complicated part of the SGS configuration but hopefully you'll see that it's not too difficult. The <params> tag is a container for zero or more <param> tags. There is one <param> tag for each command-line argument that the executable expects.

Each <param> tag must contain a set of attributes:

AttributePossible valuesDefault valuePurpose
nameplain string, no spacesNoneUnique name for the parameter
paramType"switch", "flaggedOption" or "unflaggedOption"NoneType of the parameter. See below.
required"yes" or "no""no"Set to "yes" if a value for this parameter must be set. This is irrelevant when paramType="switch".
flagsingle characterNoneFor switches and flaggedOptions, the short flag used to identify this parameter (e.g. "v" for a parameter that is specified on the command line as "-v")
longFlagplain string, no spacesNoneFor switches and flaggedOptions, the long flag used to identify this parameter (e.g. "verbose" for a parameter that is specified on the command line as "--verbose")
defaultValueplain stringNoneDefault value for the parameter. If this is set, the "required" attribute is ignored: if the user does not set a value for a parameter, this default value will be used instead
greedy"yes" or "no""no"Only meaningful for unflaggedOptions. See below.

(The Java Simple Argument Parser, JSAP, is used to handle command line arguments in both the SGS server and client code. Therefore, the nomenclature used here reflects that used in JSAP.) Most of the attributes are explained adequately (I hope) in the above table. However, some attributes require further explanation:

There are three parameter types that the SGS system understands. They are named after the differing means of specifying their values on a command line through the use of arguments:

  • A switch is an parameter that can either be true or false. On the command line (i.e. when running the executable outside the SGS framework) switches are arguments (flags) such as "-v" and "--verbose" whose presence alone is significant. They do not have an associated value.
  • A flaggedOption is like a switch but it has an associated value. On the command line this value is given by the token after the flag: e.g. "-n 5" or "--number=5" are two different ways of setting the value of a certain parameter to 5.
  • An unflaggedOption is a parameter whose value is given purely by the position of its associated argument on the command line, relative to other unflaggedOptions. The final unflaggedOption can be marked as greedy, meaning that it will consume the remainder of the command line.

It probably helps to look at some examples here. Let's say that we are wrapping an executable that reads a single input file and writes a single output file. The name of the input file is signified on the command line by the short flag "-i" or the long flag "--inputfile". The name of the output file is signified by the short flag "-o" or the long flag "--outputfile". Both of these arguments are compulsory. The <params> tag in the configuration file would look like this:

    <params>
      <param name="inputfile" paramType="flaggedOption" required="yes" 
          flag="i" longFlag="inputfile"/>
      <param name="outputfile" paramType="flaggedOption" required="yes"
          flag="o" longFlag="outputfile"/>
    </params>
    

The usage of this executable is myprog -i <inputfile> -o <outputfile>.

Now let's look at an example in which we are wrapping an executable that reads a number of input files and writes a single output file. In this case, there are no command-line flags to help us: the first argument on the command line gives the name of the output file and the remaining arguments are the names of all the input files that must be read. The <params> tag in the configuration file would look like this:

    <params>
      <param name="outputfile" paramType="unflaggedOption" required="yes"/>
      <param name="inputfiles" paramType="unflaggedOption" required="yes" greedy="yes"/>
    </params>
    

This time both parameters are unflaggedOptions (parameters whose value is found by looking at a certain position on the command line). The first argument gives the name of the output file and the remaining arguments are consumed by the inputfiles parameter, which is set to be greedy.

As a final example, let's pretend that we are wrapping an executable (called replace) that searches for all instances of a certain string in a file and replaces those instances with another string. In addition, the user can tell the program to print verbose debug information by specifying the argument "-v". Here is an example of running this executable from the command line:

    replace -i input.dat -o output.dat Hello Goodbye -v
    

This would replace all instances of "Hello" in the file input.dat with the string "Goodbye" and write the result to output.dat, whilst printing verbose debug messages. The <params> tag in the configuration file would look like this:

    <params>
      <param name="verbose" paramType="switch" flag="v"/>
      <param name="inputfile" paramType="flaggedOption" required="yes" flag="i"/>
      <param name="outputfile" paramType="flaggedOption" required="yes" flag="o"/>
      <param name="stringToFind" paramType="unflaggedOption" required="yes"/>
      <param name="stringToReplace" paramType="unflaggedOption" required="yes"/>
    </params>
    

Not that only the order of the unflaggedOptions is important. Switches and flaggedOptions can be placed anywhere on the command line and can be specified anywhere between the <params> tags.

Inputs

Having specified the parameters that the executable expects, you'll be glad to know that we've done most of the hard work. The next thing we specify in the configuration file is the set of inputs from which the executable will read. An executable (and therefore a Styx Grid Service) can read input data either from its standard input stream or from files. In the case of files, the names of these files are either fixed or they can be set using a parameter (see above). The inputs are specified between the <inputs> tags in the configuration file.

The <inputs> tag is a container for zero or more <input> tags, with one <input> tag for each file or stream that provides input data. Each <input> tag contains exactly two attributes:

AttributePossible valuesDefault valuePurpose
type"stream", "file" or "fileFromParam""file"Type of the input. If type="stream" then the name must be "stdin". If the name of the file is fixed then type="file". If the name of the file is specified by a command-line argument, then type="fileFromParam".
nameIf type="stream" then name must be "stdin". If type="fileFromParam" then name must be equal to the name of one of the parameters and that parameter must not be a switch. If type="file", the name can be any string.NoneThe name of the file or stream, or the parameter through which the name is specified.

All file names are specified relative to the working directory of the executable. This may seem a little confusing, and indeed the design here is probably not optimal. However, hopefully some examples will clear things up. We'll look at some examples when we've dealt with the <outputs> section of the configuration file.

Outputs

Output files and streams are specified in a very similar way to input files. An executable can output data as files or on one of its standard streams (stdout and stderr). In the case of output files, the names of these files can be fixed or specified by the value of a parameter.

The <outputs> tag is a container for zero or more <output> tags, with one <output> tag for each file or stream that contains output data. Each <output> tag contains exactly two attributes:

AttributePossible valuesDefault valuePurpose
type"stream", "file" or "fileFromParam""file"Type of the output. If type="stream" then the name must be "stdout" or "stderr". If the name of the file is fixed then type="file". If the name of the file is specified by a command-line argument, then type="fileFromParam".
nameIf type="stream" then name must be "stdout" or "stderr". If type="fileFromParam" then name must be equal to the name of one of the parameters, and that parameter must not be a switch. If type="file", the name can be any string.NoneThe name of the file or stream, or the parameter through which the name is specified.
All file names are specified relative to the working directory of the executable.

Service Data

Service data is information about the state of a particular Styx Grid Service instance. For example, the status of a service is represented by a service data element (SDE), which can contain values such as "created", "running" and "finished". The "status" SDE is built in to the system and the user does not need to specify it in the configuration file. It is possible for users to create their own service data elements but this is considered an "advanced" topic and will not be described here (yet).

Steerable parameters

With some programs (e.g. fluid dynamics simulations) it is possible to adjust the values of some parameters while the program is running. The <steering> section of the configuration file allows this to be set up, but again this is an "advanced" (and rarely-used) topic and will not be described here at the moment.

Documentation

The Styx Grid Service framework allows service providers to provide access to free-form documentation about each service. This is specified between the <docs> tags in the configuration file. The <docs> tag is a container for zero or more <doc> tags. Each <doc> tag is a file or directory that contains documentation: if it represents a directory then all the files under that directory will be exposed for reading by clients.

The specification of the <doc> tag is very simple:

AttributePossible valuesDefault valuePurpose
locationvalid pathNoneFull path to the documentation file or directory.
nameplain stringNone(Optional) An alias for the name of the file or directory. The value of this attribute will be used as the name of the file from the point of view of clients.

For example, let's say that we want to expose two documentation elements. The first is a directory of documentation files (say, a set of Word documents that describe the operation of the executable). The second is a simple one-paragraph description of the executable that is called "description.txt" in real life, but we want to expose it with the name "README". The documentation part of the configuration file would look like this:

    <docs>
      <doc location="c:\myprog\docs\">
      <doc location="c:\myprog\description.txt" name="README">
    </docs>
    

Examples

OK, we've gone through the nuts and bolts of the Styx Grid Service configuration file in some detail. Let's put it all together with a couple of examples. The sections you will have to worry about most are the parameters and the input and output files. Other sections are used a lot less, so it is those three sections which we shall focus on here.

Example 1: a simple Unix filter

As a first example, let's look at how we expose a very simple program as a Styx Grid Service. We'll take the example of the md5sum program, a program found on most Unix-like systems. The md5sum program reads data from its standard input and calculates a "digest" of the data in the form of a large number which is printed out (usually as a hexadecimal string) to its standard output. (The MD5 digest is usually used as a "checksum": the MD5 digest of a file is a large number that is highly unlikely to have been produced by any other file.). Programs that behave in this way (i.e. that read data from standard input and write data to standard output) are sometimes known as filters.

The entire configuration file that is required to expose the md5sum program as a Styx Grid Service is as follows (the first two lines just declare that this is an XML file and that it conforms to the specification given in SGSconfig.dtd):

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE sgs SYSTEM "SGSconfig.dtd">
      
    <sgs>
      
      <gridservices>
      
        <gridservice name="md5sum" command="/usr/bin/md5sum"
            description="Calculates the MD5 checksum of data that are read from standard input">
          
          <inputs>
            <input type="stream" name="stdin"/>
          </inputs>
          
          <outputs>
            <output type="stream" name="stdout"/>
            <output type="stream" name="stderr"/>
          </outputs>
          
        </gridservice>
        
      </gridservices>
      
    </sgs>
    

Working down this file: The <server> tag is omitted and so default values are chosen for the server settings. We specify a single Styx Grid Service called md5sum and specify the full path to the executable that we are wrapping. The SGS takes no parameters, but reads data from its standard input and writes data to its standard output and standard error streams.

Example 2: replace

In the "Parameters" section above we specified the parameters taken by an executable that reads an input file, replaces all instances of one string with another, then writes the resulting output file. We've actually already done the hardest bit of creating the configuration file in this case: all we need to do now is to specify the input and output file in the configuration document. The information below must be placed within the <gridservices> tag in a complete configuration file such as that given in example 1 above:

    <gridservice name="replace" command="C:\path\to\replace.exe"
        description="Finds and replaces a string in a file">
            
      <params>
        <param name="verbose" paramType="switch" flag="v"/>
        <param name="inputfile" paramType="flaggedOption" required="yes" flag="i"/>
        <param name="outputfile" paramType="flaggedOption" required="yes" flag="o"/>
        <param name="stringToFind" paramType="unflaggedOption" required="yes"/>
        <param name="stringToReplace" paramType="unflaggedOption" required="yes"/>
      </params>
          
      <inputs>
        <input type="fileFromParam" name="inputfile"/>
      </inputs>
          
      <outputs>
        <output type="fileFromParam" name="outputfile"/>
      </outputs>
          
    </gridservice>
    

We've already described the parameters above. All we have done here is to state that the executable expects one input file, whose name will be given by the value of the parameter called "inputfile". Furthermore, we state that the executable writes a single output file, whose name is given by the value of the parameter called "outputfile".