The earlier sections of this tutorial have shown how remote Styx Grid Services can be executed exactly as if they were local programs. This means that we can link SGSs together to form a distributed application (or "workflow") just as easily as we can link local programs together to achieve a goal. Styx Grid Services, like local programs, can be linked together with simple shell scripts (or batch files under Windows). This paper describes how Styx Grid Services can be used in this way.
Let us create a very simple distributed application (or workflow) from two
of the Styx Grid Services that we have already met: HelloWorld and Reverse.
We are going to use the HelloWorld SGS to output the string "Hello World"
and the Reverse SGS to reverse that string.
We can achieve this by piping the output from the HelloWorld SGS
to the input of the Reverse SGS, just as if they were local
programs:
SGSRun localhost 9092 helloworld | SGSRun localhost 9092 reverse
The output from this simple workflow should be "dlroW olleH". If
we were to create wrapper scripts called helloworld and reverse
(as discussed earlier in this tutorial) we could simply write:
helloworld | reverse
In the above example, both SGSs were running on the same server. If you are able, try running the SGS server on two different machines and performing the same workflow again, for example:
SGSRun machine1 9092 helloworld | SGSRun machine2 9092 reverse
The above example demonstrated the use of the pipe operator to send the data
between the two SGSs. You could of course send the data to an intermediate
file and use the reverse2 SGS, which reads input from a file rather
than from its standard input:
SGSRun localhost 9092 helloworld > temp.txt SGSRun localhost 9092 reverse2 -i temp.txt -o reversed.txt
The file reversed.txt should now contain the string "dlroW olleH".
One of the strengths of the SGS system lies in the fact that you can pass input files by reference. In other words, instead of specifying an actual input file, you can specify a URL to a file on a different server.
For example, let's run the reverse2 Styx Grid Service, using
input data from the Web:
SGSRun localhost 9092 reverse2 -i readfrom:http://www.google.com -o output.txt
When this finishes, open output.txt and verify that it contains
the contents of the Google home page (in HTML), but each line of text has its
characters reversed.
IMPORTANT: You must use the syntax "-i readfrom:URL"
rather than just "-i URL". There is a good reason for this, which
we won't go into now.
Let's have a quick look in more detail at what has happened in this example:
reverse2 service.reverse2 program.Let's create a silly workflow of two Styx Grid Services. We're going to reverse the contents of a file, then do the same again so that the contents of the final result are identical to the original file:
SGSRun localhost 9092 reverse2 -i input.txt -o output1.txt SGSRun localhost 9092 reverse2 -i output1.txt -o output2.txt
If you run this with some input file (or you could pass in data from a
URL as above) you should be able to verify that input.txt
and output2.txt have the same contents.
Let's pretend that we were working with large files and that we weren't
interested in the intermediate file (output1.txt). We have
wasted time and bandwidth by downloading output1.txt to our
local machine and then immediately uploading it to the second service
in the above workflow.
We can be more efficient by downloading (and then uploading) a reference
to the intermediate file, with a small change to the workflow. We just add
a .sgsref extension to any output file that we want to get
a reference to. Then we can upload that reference exactly as if it were
the file itself:
SGSRun localhost 9092 reverse2 -i input.txt -o output1.txt.sgsref SGSRun localhost 9092 reverse2 -i output1.txt.sgsref -o output2.txt
You should be able to verify that this has the same overall effect as the
previous workflow. If you examine the contents of the output1.txt.sgsref
file you will find that it contains the string
"readfrom:styx://.../reverse2/instances/.../outputs/outputfile".
This is a reference to the output file that was produced by the first
SGS.
Let's go back to the first example in this section of the tutorial. We printed the string "Hello World" then reversed it using two SGSs:
SGSRun localhost 9092 helloworld | SGSRun localhost 9092 reverse
What happened behind the scenes was this: the standard output from the
helloworld service was redirected to the local console
window. Instead of being printed out, it was redirected immediately to the
remote reverse service. In other words the data made an
unnecessary trip to our client machine and back out again.
As above, we can arrange for the data to be passed directly between the
two services. However, this time we have no filename to which we can append
the magic ".sgsref" extension so what do we do? You can
find out by using the help system: enter SGSRun localhost 9092 helloworld --sgs-verbose-help
(see Getting help). There is a command-line
switch --sgs-ref-stdout, which will cause a reference to the
output data to be printed to the console window instead of the data themselves.
It is this reference that is passed to the reverse service:
SGSRun localhost 9092 helloworld --sgs-ref-stdout | SGSRun localhost 9092 reverse
The string "Hello World" has been passed directly between the two services.
You should now be getting the picture that you can create shell scripts (or batch files) that tie Styx Grid Services together to produce distributed applications. The SGSRun program behaves exactly like the program that has been wrapped as a Styx Grid Service. It even captures the error code from the remotely-running executable and returns this error code when it finishes. Therefore, you can trap this error code to see if the remote executable has finished successfully.
The SGS system is a very quick and easy way to create workflows that are based on remote services. We have seen how data can be passed directly between services. However, unlike other workflow systems (e.g. Web Service-based ones), the units of information that are being passed around are files. In other systems, these units might be strings, integers or perhaps objects. This means that it is up to the individual services in an SGS workflow to verify that its input files are valid (the inputs and outputs are very weakly typed). Exactly the same problem is of course faced when using shell scripts to tie together local programs.