We shall now create some Styx Grid Services from programs that are a little more complex than a Hello World program. The process for doing so is exactly the same:
In this section of the tutorial we shall look at programs that read some input and produce some output.
A filter is simply a program that reads data from its standard input and writes to its standard output. Filters are very common in Unix and Linux systems.
We shall create a Styx Grid Service that wraps a filter program that reads lines of text from its standard input, reverses each line and prints the lines to the standard output. As with the HelloWorld program of the first part of this tutorial, this has been implemented in Java in the Reverse class.
In order to deploy this as a Styx Grid Service, we must create an XML description
of this program. This is included in the configuration file that is provided
with the JStyx distribution (the SGSconfig.xml file in the conf/
directory) but the relevant portion is reproduced here:
<gridservice name="reverse"
command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Reverse"
description="Reads lines of input and outputs them with characters reversed">
<inputs>
<input type="stream" name="stdin"/>
</inputs>
<outputs>
<output type="stream" name="stdout"/>
<output type="stream" name="stderr"/>
</outputs>
</gridservice>This specifies that the SGS called "reverse" will read data from its standard input and write data to its standard output and error streams. Run the SGS server by entering:
GridServices
as before. Assuming that the server has started successfully, you can run the service (under Unix or Cygwin) by entering:
cat somefile.txt | SGSRun localhost 9092 reverse
(You may, of course, have to change the hostname and port of the server).
The pipe operator | redirects the standard output from the cat
program to the standard input of the SGSRun program. The
SGSRun program streams this information to the SGS server,
which passes it to the reverse program.
You can also run this from Windows from a command prompt:
type somefile.txt | SGSRun localhost 9092 reverse
As with the HelloWorld example, you can create
a shell script or batch file called "reverse" that runs
SGSRun with the correct hostname and port. This script can
then be used in exactly the same manner as the Reverse program itself if it
were installed locally.
We shall now create an SGS that behaves much like the reverse
service from the above section, but works in a slightly different way. Instead
of reading data from the standard input and writing to the standard input, our
new SGS will read data from an input file and output to a different file. As
before, it will read each line of text from the file, reverse it and write
the reversed lines to the output file.
The entry in the XML configuration file is slightly more complicated:
<gridservice name="reverse2"
command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Reverse"
description="Reads lines of input and outputs them with characters reversed">
<params>
<param name="inputfile" paramType="flaggedOption"
flag="i" required="yes" description="Name of input file"/>
<param name="outputfile" paramType="flaggedOption"
flag="o" required="yes" description="Name of output file"/>
</params>
<inputs>
<input type="fileFromParam" name="inputfile"/>
</inputs>
<outputs>
<output type="fileFromParam" name="outputfile"/>
</outputs>
</gridservice>(Note that we are using the same Java program: the Reverse class. If you look at the code for this class you'll see how it works.) Let's work through this section of the configuration file:
<params> section defines the two command-line
parameters that are understood by the Reverse program. They are both
"flaggedOptions", which means that they are specified through
the use of command-line flags. The name of the input file will be given
by the item following the -i flag and the output file name will
be given by the item following the -o flag.<inputs> section specifies that the program
will take a single input file. The type="fileFromParam"
attribute specifies that the name of the input file is given by the
value of the parameter called "inputfile", i.e.
the value after the -i flag.<outputs> section specifies that the program
will produce a single output file. The type="fileFromParam"
attribute specifies that the name of the output file is given by the
value of the parameter called "outputfile", i.e.
the value after the -o flag.reverse2 service like this:
SGSRun localhost 9092 reverse2 -i somefile.txt -o output.txt
The SGSRun automatically uploads the input file (somefile.txt)
to the server and downloads the output file (output.txt).
Sometimes you might want to create an SGS from a program that expects fixed
names for its input and output files. For example, the program may always
read input from a file called input.txt and write output to
output.txt. In this case you will not use command-line parameters
to set the names of the input and output files.
To achieve this, in the config file we use the type "file"
(instead of "fileFromParam" or "stream")
to specify the name of the files. The following piece of XML configures an
SGS called "replace", which reads lines of input
from an input file called input.txt, replaces all instances
of one string with another and writes the result to output.txt:
<gridservice name="replace"
command="JStyxRun uk.ac.rdg.resc.jstyx.gridservice.tutorial.Replace"
description="Replaces all instances of one string in a file with another">
<params>
<param name="stringToFind" paramType="unflaggedOption"
required="yes" description="String to find"/>
<param name="stringToReplace" paramType="unflaggedOption"
required="yes" description="String to replace"/>
<param name="verbose" paramType="switch" flag="v"
longFlag="verbose" description="If set true, will print verbose output to stdout"/>
</params>
<inputs>
<input type="file" name="input.txt"/>
</inputs>
<outputs>
<output type="file" name="output.txt"/>
<output type="stream" name="stdout"/>
<output type="stream" name="stderr"/>
</outputs>
</gridservice>
Note the use of unflaggedOptions to specify the strings to
find and replace. These are command-line arguments that do not use a
flag to signal their presence. Provided that you have a file called
input.txt in your current directory, you can run the replace
SGS as follows:
SGSRun localhost 9092 replace hello goodbye
This will replace all instances of the word "hello" with
"goodbye" in the file input.txt, writing the results
to output.txt. As you may have gathered from the above XML,
you can use the command-line flag -v (or --verbose)
to produce more verbose output.
Even though the names of the files are fixed, you can still pass references to input files and get references to output files. Type
SGSRun localhost 9092 replace --sgs-verbose-help
SGSRun localhost 9092 replace hello goodbye
--sgs-ref-input.txt=readfrom:http://www.google.com --sgs-ref-output.txt
will read input data from http://www.google.com and write a reference to the
output data into the file output.txt. (The "readfrom"
part is actually unnecessary in this case by the way.) Note that you can also
use this technique to stream data from a remote source into the standard
input of an SGS (--sgs-ref-stdin=readfrom:URL).
If you have a program that expect user interaction through the command line
(i.e. the user enters data at the keyboard), you can expose this as a
Styx Grid Service. In fact, you have already done so: the reverse
service from section 1 above reads data from its standard input. Try running:
SGSRun localhost 9092 reverse
without piping any data to its standard input. The program will just sit and
wait for you to type at the keyboard. Every line you type will be reversed
by the reverse SGS and sent back to you, printed on the standard
output (console window). This will continue until you enter an end-of-file
command (Control-Z in Windows and Control-D in many other systems).
You could expose any interactive program in this way, including the Python interactive shell and the bash shell! Of course, there may be serious security implications connected with doing this if the program you are exposing as an SGS allows the user to enter data that can cause damage to your system. (This is true with any Styx Grid Service, of course.)