Styx Grid Services Tutorial: Creating more complex services

We shall now create some Styx Grid Services from programs that are a little more complex than a Hello World program. The process for doing so is exactly the same:

  1. Create a machine-readable description of the program (the configuration file).
  2. Run the SGS server using this description.
  3. Run the service using the SGS client software.

In this section of the tutorial we shall look at programs that read some input and produce some output.

1. A simple filter

A filter is simply a program that reads data from its standard input and writes to its standard output. Filters are very common in Unix and Linux systems.

We shall create a Styx Grid Service that wraps a filter program that reads lines of text from its standard input, reverses each line and prints the lines to the standard output. As with the HelloWorld program of the first part of this tutorial, this has been implemented in Java in the Reverse class.

In order to deploy this as a Styx Grid Service, we must create an XML description of this program. This is included in the configuration file that is provided with the JStyx distribution (the SGSconfig.xml file in the conf/ directory) but the relevant portion is reproduced here:

<gridservice name="reverse"
    command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Reverse"
    description="Reads lines of input and outputs them with characters reversed">
  <inputs>
    <input type="stream" name="stdin"/>
  </inputs>
  <outputs>
    <output type="stream" name="stdout"/>
    <output type="stream" name="stderr"/>
  </outputs>
</gridservice>

This specifies that the SGS called "reverse" will read data from its standard input and write data to its standard output and error streams. Run the SGS server by entering:

GridServices

as before. Assuming that the server has started successfully, you can run the service (under Unix or Cygwin) by entering:

cat somefile.txt | SGSRun localhost 9092 reverse

(You may, of course, have to change the hostname and port of the server). The pipe operator | redirects the standard output from the cat program to the standard input of the SGSRun program. The SGSRun program streams this information to the SGS server, which passes it to the reverse program.

You can also run this from Windows from a command prompt:

type somefile.txt | SGSRun localhost 9092 reverse

As with the HelloWorld example, you can create a shell script or batch file called "reverse" that runs SGSRun with the correct hostname and port. This script can then be used in exactly the same manner as the Reverse program itself if it were installed locally.

2. Input and output files

We shall now create an SGS that behaves much like the reverse service from the above section, but works in a slightly different way. Instead of reading data from the standard input and writing to the standard input, our new SGS will read data from an input file and output to a different file. As before, it will read each line of text from the file, reverse it and write the reversed lines to the output file.

The entry in the XML configuration file is slightly more complicated:

<gridservice name="reverse2"
    command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Reverse"
    description="Reads lines of input and outputs them with characters reversed">
  <params>
    <param name="inputfile" paramType="flaggedOption"
        flag="i" required="yes" description="Name of input file"/>
    <param name="outputfile" paramType="flaggedOption"
        flag="o" required="yes" description="Name of output file"/>
  </params>
  <inputs>
    <input type="fileFromParam" name="inputfile"/>
  </inputs>
  <outputs>
    <output type="fileFromParam" name="outputfile"/>
  </outputs>
</gridservice>

(Note that we are using the same Java program: the Reverse class. If you look at the code for this class you'll see how it works.) Let's work through this section of the configuration file:

  • The <params> section defines the two command-line parameters that are understood by the Reverse program. They are both "flaggedOptions", which means that they are specified through the use of command-line flags. The name of the input file will be given by the item following the -i flag and the output file name will be given by the item following the -o flag.
  • The <inputs> section specifies that the program will take a single input file. The type="fileFromParam" attribute specifies that the name of the input file is given by the value of the parameter called "inputfile", i.e. the value after the -i flag.
  • The <outputs> section specifies that the program will produce a single output file. The type="fileFromParam" attribute specifies that the name of the output file is given by the value of the parameter called "outputfile", i.e. the value after the -o flag.
(See this page for a more complete description of the config file specification.) Having started the SGS server, you can run the reverse2 service like this:

SGSRun localhost 9092 reverse2 -i somefile.txt -o output.txt

The SGSRun automatically uploads the input file (somefile.txt) to the server and downloads the output file (output.txt).

3. Fixed-name input and output files

Sometimes you might want to create an SGS from a program that expects fixed names for its input and output files. For example, the program may always read input from a file called input.txt and write output to output.txt. In this case you will not use command-line parameters to set the names of the input and output files.

To achieve this, in the config file we use the type "file" (instead of "fileFromParam" or "stream") to specify the name of the files. The following piece of XML configures an SGS called "replace", which reads lines of input from an input file called input.txt, replaces all instances of one string with another and writes the result to output.txt:

<gridservice name="replace"
    command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Replace"
    description="Replaces all instances of one string in a file with another">
  <params>
    <param name="stringToFind" paramType="unflaggedOption"
        required="yes" description="String to find"/>
    <param name="stringToReplace" paramType="unflaggedOption"
        required="yes" description="String to replace"/>
    <param name="verbose" paramType="switch" flag="v"
        longFlag="verbose" description="If set true, will print verbose output to stdout"/>
  </params>
  <inputs>
    <input type="file" name="input.txt"/>
  </inputs>
  <outputs>
    <output type="file" name="output.txt"/>
    <output type="stream" name="stdout"/>
    <output type="stream" name="stderr"/>
  </outputs>
</gridservice>

Note the use of unflaggedOptions to specify the strings to find and replace. These are command-line arguments that do not use a flag to signal their presence. Provided that you have a file called input.txt in your current directory, you can run the replace SGS as follows:

SGSRun localhost 9092 replace hello goodbye

This will replace all instances of the word "hello" with "goodbye" in the file input.txt, writing the results to output.txt. As you may have gathered from the above XML, you can use the command-line flag -v (or --verbose) to produce more verbose output.

Even though the names of the files are fixed, you can still pass references to input files and get references to output files. Type

SGSRun localhost 9092 replace --sgs-verbose-help
and you'll see the arguments that you have to set. For example:

SGSRun localhost 9092 replace hello goodbye
    --sgs-ref-input.txt=readfrom:http://www.google.com --sgs-ref-output.txt

will read input data from http://www.google.com and write a reference to the output data into the file output.txt. (The "readfrom" part is actually unnecessary in this case by the way.) Note that you can also use this technique to stream data from a remote source into the standard input of an SGS (--sgs-ref-stdin=readfrom:URL).

4. Interactive programs

If you have a program that expect user interaction through the command line (i.e. the user enters data at the keyboard), you can expose this as a Styx Grid Service. In fact, you have already done so: the reverse service from section 1 above reads data from its standard input. Try running:

SGSRun localhost 9092 reverse

without piping any data to its standard input. The program will just sit and wait for you to type at the keyboard. Every line you type will be reversed by the reverse SGS and sent back to you, printed on the standard output (console window). This will continue until you enter an end-of-file command (Control-Z in Windows and Control-D in many other systems).

You could expose any interactive program in this way, including the Python interactive shell and the bash shell! Of course, there may be serious security implications connected with doing this if the program you are exposing as an SGS allows the user to enter data that can cause damage to your system. (This is true with any Styx Grid Service, of course.)