In order to run a Styx Grid Services server, you will need to create a configuration file in XML. This section describes the format of this XML file and gives some examples of how to set up Styx Grid Services.
The overall structure of the XML file is quite simple. If you are not
familiar with XML, don't worry. XML files are just text files with a
defined structure. Important bits of information are placed between tags
like so: <name>Joe Bloggs</name>. If this reminds
you of HTML, there's a good reason for this. Modern, well-structured HTML
(known as XHTML) is actually a "flavour" of XML.
The configuration file is described by a Document Type Definition (DTD).
The DTD specification for the SGS configuration file is found in
conf/SGSconfig.dtd. You don't need to worry about this: it
is used internally by the SGS software to make sure that the configuration
file is valid. If you create an invalid configuration file, this will be
detected when you try to run the SGS server and an error message will appear.
The large-scale structure of the configuration file looks like this:
<sgs>
<server address="sgs.myserver.com" port="9092" cacheLocation="C:\StyxGridServices">
...
</server>
<gridservices>
<gridservice name="mysgs" ... ></gridservice>
<gridservice name="anotherSGS" ...></gridservice>
...
</gridservices>
</sgs>
Everything is contained between <sgs> and </sgs>
tags. The information between the <server> tags specifies
the server settings. The <server> tag itself has three possible
attributes:
| Attribute | Possible values | Default value | Purpose |
|---|---|---|---|
| address | Hostname or IP address | Auto-detected | This atribute is used to specify the address (hostname or IP address)
of the server from the point of view of clients (i.e. the public
address). It is an optional attribute: if it is omitted or left blank,
the system will attempt to detect the server's IP address using
Java's InetAddress.getLocalHost().getHostAddress() method. |
| port | Integer between 256 and 65535 inclusive | 9092 | This atribute is used to set the port on which the server will listen. The port number must not be in use by any other process and the user running the server must have permission to use this port (on many systems including Unix, only the root user is allowed to use ports with numbers less than 1024). If this attribute is omitted or left blank, port 9092 will be used by default. |
| cacheLocation | Valid directory location | $HOME/StyxGridServices | The value of this attribute is the directory on the server that will be used
to store information about all the services. This directory is used for cached
files, state data and other things. This directory will be created when the server
starts if it does not already exist. You (i.e. the user running the server process)
must have write permissions in this directory. If this attribute is omitted
or left blank, the system will use or create a directory called
StyxGridServices in the user's home directory. (The user's
home directory is found using Java's user.home system property.) |
<server> tag can be omitted from the config
file altogether. In this case, default values will be chosen for all attributes
and the server will be unsecured.
The contents of the <gridservices> tag are explained in the following
sections.
The <gridservices> tag is a container for all the
<gridservice> tags. There is one <gridservice>
tag for each Styx Grid Service that the server exposes. This tag contains all
the information about the executable that the SGS is wrapping: the
path to the executable, the command-line parameters that it expects, the
input files it consumes, the output files it creates, plus some other things.
The structure of the <gridservice> tag and its sub-tags
looks like this:
<gridservice name="mysgs" command="C:\path\to\executable"
description="A Styx Grid Service">
<params>
...
</params>
<inputs>
...
</inputs>
<outputs>
...
</outputs>
<serviceData>
...
</serviceData>
<steering>
...
</steering>
<docs>
...
</docs>
</gridservice>
The <gridservice> tag itself has three attributes. The
name attribute gives a short name for the SGS that will
be used to identify it. This name must be different from the names of all the
other SGSs on this server. This name cannot contain spaces. The command
attribute specifies the full path to the executable that will be run. A short,
one-sentence description of the SGS can be placed in the optional
description tag.
The sub-tags (children) of the <gridservice> tag specify
different aspects of the Styx Grid Service. Most SGSs will only require a
few of these tags to be used, as we shall see. We shall now go through
each of these tags in turn and describe how to use them.
Parameters are values that are set before an SGS is run. In the current system
the parameters translate directly into the command-line arguments for the
underlying executable. The parameters are specified
between the <params> tags. This is perhaps the most
complicated part of the SGS configuration but hopefully you'll see that it's
not too difficult. The <params> tag is a container for
zero or more <param> tags. There is one <param>
tag for each command-line argument that the executable expects.
Each <param> tag must contain a set of attributes:
| Attribute | Possible values | Default value | Purpose |
|---|---|---|---|
| name | plain string, no spaces | None | Unique name for the parameter |
| paramType | "switch", "flaggedOption" or "unflaggedOption" | None | Type of the parameter. See below. |
| required | "yes" or "no" | "no" | Set to "yes" if a value for this parameter must be set. This is irrelevant when paramType="switch". |
| flag | single character | None | For switches and flaggedOptions, the short flag used to identify this parameter (e.g. "v" for a parameter that is specified on the command line as "-v") |
| longFlag | plain string, no spaces | None | For switches and flaggedOptions, the long flag used to identify this parameter (e.g. "verbose" for a parameter that is specified on the command line as "--verbose") |
| defaultValue | plain string | None | Default value for the parameter. If this is set, the "required" attribute is ignored: if the user does not set a value for a parameter, this default value will be used instead |
| greedy | "yes" or "no" | "no" | Only meaningful for unflaggedOptions. See below. |
(The Java Simple Argument Parser, JSAP, is used to handle command line arguments in both the SGS server and client code. Therefore, the nomenclature used here reflects that used in JSAP.) Most of the attributes are explained adequately (I hope) in the above table. However, some attributes require further explanation:
There are three parameter types that the SGS system understands. They are named after the differing means of specifying their values on a command line through the use of arguments:
It probably helps to look at some examples here. Let's say that we are
wrapping an executable that reads a single input file and writes a single
output file. The name of the input file is signified on the command line by the short flag "-i"
or the long flag "--inputfile". The name of the output file is
signified by the short flag "-o" or the long flag "--outputfile".
Both of these arguments are compulsory.
The <params> tag in the configuration file would look like this:
<params>
<param name="inputfile" paramType="flaggedOption" required="yes"
flag="i" longFlag="inputfile"/>
<param name="outputfile" paramType="flaggedOption" required="yes"
flag="o" longFlag="outputfile"/>
</params>
The usage of this executable is myprog -i <inputfile> -o <outputfile>.
Now let's look at an example in which we are wrapping an executable that
reads a number of input files and writes a single output file. In this case,
there are no command-line flags to help us: the first argument on the command
line gives the name of the output file and the remaining arguments are the names
of all the input files that must be read.
The <params> tag in the configuration file would look like this:
<params>
<param name="outputfile" paramType="unflaggedOption" required="yes"/>
<param name="inputfiles" paramType="unflaggedOption" required="yes" greedy="yes"/>
</params>
This time both parameters are unflaggedOptions (parameters whose value is found
by looking at a certain position on the command line). The first argument
gives the name of the output file and the remaining arguments are consumed
by the inputfiles parameter, which is set to be greedy.
As a final example, let's pretend that we are wrapping an executable (called replace)
that searches for all instances of a certain string in a file and replaces those instances
with another string. In addition, the user can tell the program to print
verbose debug information by specifying the argument "-v". Here is an example
of running this executable from the command line:
replace -i input.dat -o output.dat Hello Goodbye -v
This would replace all instances of "Hello" in the file input.dat
with the string "Goodbye" and write the result to output.dat,
whilst printing verbose debug messages.
The <params> tag in the configuration file would look like this:
<params>
<param name="verbose" paramType="switch" flag="v"/>
<param name="inputfile" paramType="flaggedOption" required="yes" flag="i"/>
<param name="outputfile" paramType="flaggedOption" required="yes" flag="o"/>
<param name="stringToFind" paramType="unflaggedOption" required="yes"/>
<param name="stringToReplace" paramType="unflaggedOption" required="yes"/>
</params>
Not that only the order of the unflaggedOptions is important. Switches
and flaggedOptions can be placed anywhere on the command line and can be
specified anywhere between the <params> tags.
Having specified the parameters that the executable expects, you'll be glad
to know that we've done most of the hard work. The next thing we specify in
the configuration file is the set of inputs from which the executable will read.
An executable (and therefore a Styx Grid Service) can read input data either
from its standard input stream or from files. In the case of files, the names
of these files are either fixed or they can be set using a parameter (see above).
The inputs are specified between the <inputs> tags in the
configuration file.
The <inputs> tag is a container for
zero or more <input> tags, with one <input>
tag for each file or stream that provides input data.
Each <input> tag contains exactly two attributes:
| Attribute | Possible values | Default value | Purpose |
|---|---|---|---|
| type | "stream", "file" or "fileFromParam" | "file" | Type of the input. If type="stream" then
the name must be "stdin". If the name of the file is
fixed then type="file". If the name of the file is
specified by a command-line argument, then type="fileFromParam". |
| name | If type="stream" then
name must be "stdin". If type="fileFromParam"
then name must be equal to the name of one of the
parameters and that parameter must not be a switch. If
type="file", the name can be any string. | None | The name of the file or stream, or the parameter through which the name is specified. |
All file names are specified relative to the working directory of the executable.
This may seem a little confusing, and indeed the design here is probably not
optimal. However, hopefully some examples will clear things up. We'll look at
some examples when we've dealt with the <outputs> section of
the configuration file.
Output files and streams are specified in a very similar way to input files. An executable can output data as files or on one of its standard streams (stdout and stderr). In the case of output files, the names of these files can be fixed or specified by the value of a parameter.
The <outputs> tag is a container for
zero or more <output> tags, with one <output>
tag for each file or stream that contains output data.
Each <output> tag contains exactly two attributes:
| Attribute | Possible values | Default value | Purpose |
|---|---|---|---|
| type | "stream", "file" or "fileFromParam" | "file" | Type of the output. If type="stream" then
the name must be "stdout" or "stderr". If the name of the file is
fixed then type="file". If the name of the file is
specified by a command-line argument, then type="fileFromParam". |
| name | If type="stream" then
name must be "stdout" or "stderr".
If type="fileFromParam"
then name must be equal to the name of one of the
parameters, and that parameter must not be a switch. If
type="file", the name can be any string. | None | The name of the file or stream, or the parameter through which the name is specified. |
Service data is information about the state of a particular Styx Grid Service instance. For example, the status of a service is represented by a service data element (SDE), which can contain values such as "created", "running" and "finished". The "status" SDE is built in to the system and the user does not need to specify it in the configuration file. It is possible for users to create their own service data elements but this is considered an "advanced" topic and will not be described here (yet).
With some programs (e.g. fluid dynamics simulations) it is possible to
adjust the values of some parameters while the program is running. The
<steering> section of the configuration file allows this
to be set up, but again this is an "advanced" (and rarely-used) topic and
will not be described here at the moment.
The Styx Grid Service framework allows service providers to provide access
to free-form documentation about each service. This is specified between
the <docs> tags in the configuration file. The
<docs> tag is a container for zero or more
<doc> tags. Each <doc> tag is a file
or directory that contains documentation: if it represents a directory then
all the files under that directory will be exposed for reading by clients.
The specification of the <doc> tag is very simple:
| Attribute | Possible values | Default value | Purpose |
|---|---|---|---|
| location | valid path | None | Full path to the documentation file or directory. |
| name | plain string | None | (Optional) An alias for the name of the file or directory. The value of this attribute will be used as the name of the file from the point of view of clients. |
For example, let's say that we want to expose two documentation elements. The first is a directory of documentation files (say, a set of Word documents that describe the operation of the executable). The second is a simple one-paragraph description of the executable that is called "description.txt" in real life, but we want to expose it with the name "README". The documentation part of the configuration file would look like this:
<docs>
<doc location="c:\myprog\docs\">
<doc location="c:\myprog\description.txt" name="README">
</docs>
OK, we've gone through the nuts and bolts of the Styx Grid Service configuration file in some detail. Let's put it all together with a couple of examples. The sections you will have to worry about most are the parameters and the input and output files. Other sections are used a lot less, so it is those three sections which we shall focus on here.
As a first example, let's look at how we expose a very simple program as a
Styx Grid Service. We'll take the example of the md5sum program,
a program found on most Unix-like systems. The md5sum program
reads data from its standard input and calculates a "digest" of the data
in the form of a large number which is printed out (usually as a hexadecimal
string) to its standard output. (The MD5 digest is usually used as a "checksum":
the MD5 digest of a file is a large number that is highly unlikely to have been
produced by any other file.). Programs that behave in this way (i.e. that read
data from standard input and write data to standard output) are sometimes
known as filters.
The entire configuration file that is required to expose the md5sum
program as a Styx Grid Service is as follows (the first two lines just
declare that this is an XML file and that it conforms to the specification
given in SGSconfig.dtd):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE sgs SYSTEM "SGSconfig.dtd">
<sgs>
<gridservices>
<gridservice name="md5sum" command="/usr/bin/md5sum"
description="Calculates the MD5 checksum of data that are read from standard input">
<inputs>
<input type="stream" name="stdin"/>
</inputs>
<outputs>
<output type="stream" name="stdout"/>
<output type="stream" name="stderr"/>
</outputs>
</gridservice>
</gridservices>
</sgs>
Working down this file: The <server> tag is omitted
and so default values are chosen for the
server settings.
We specify a single Styx Grid Service called md5sum and specify
the full path to the executable that we are wrapping. The SGS takes no
parameters, but reads data from its standard input and writes data to its
standard output and standard error streams.
In the "Parameters" section above we specified the parameters taken
by an executable that reads an input file, replaces all instances of one string
with another, then writes the resulting output file. We've actually already done
the hardest bit of creating the configuration file in this case: all we need
to do now is to specify the input and output file in the configuration
document. The information below must be placed within the <gridservices>
tag in a complete configuration file such as that given in example 1 above:
<gridservice name="replace" command="C:\path\to\replace.exe"
description="Finds and replaces a string in a file">
<params>
<param name="verbose" paramType="switch" flag="v"/>
<param name="inputfile" paramType="flaggedOption" required="yes" flag="i"/>
<param name="outputfile" paramType="flaggedOption" required="yes" flag="o"/>
<param name="stringToFind" paramType="unflaggedOption" required="yes"/>
<param name="stringToReplace" paramType="unflaggedOption" required="yes"/>
</params>
<inputs>
<input type="fileFromParam" name="inputfile"/>
</inputs>
<outputs>
<output type="fileFromParam" name="outputfile"/>
</outputs>
</gridservice>
We've already described the parameters above. All we have done here is to state that the executable expects one input file, whose name will be given by the value of the parameter called "inputfile". Furthermore, we state that the executable writes a single output file, whose name is given by the value of the parameter called "outputfile".