What are Styx Grid Services?

Styx Grid Services (SGSs) are a means for wrapping command-line programs and allowing them to be used remotely over the Internet. When deployed as an SGS, a program can be run from anywhere on the Internet exactly as if it were a local program. (Note that the Styx Grid Services system will soon be renamed VEIGA: the Very Easy Interface to Grid Applications. Watch this space!)

This poster is a good summary of the main capabilities of the SGS system. This paper is a more detailed description of the capabilites of the SGS system.

The Styx Grid Services software is built on top of, and is bundled with, the JStyx library, which is a pure-Java implementation of the Styx protocol for distributed systems. See here for more details about JStyx.

Why use the SGS system?

Styx Grid Services are useful when you want to run a program on a machine that is not your own (they are somewhat analogous to Web Services). You can then run the program from anywhere on the Internet exactly as if it were a local program. You might want to do this because:

  1. The program requires a different operating system or architecture from your own machine
  2. The program requires a larger amount of memory or processing power than you have on your own machine
  3. The program requires access to a data store that you cannot access from your local machine
  4. You want to allow other people to run the program from elsewhere on the Internet

Styx Grid Services are very simple to install and use and require a minimum of software (just a Java virtual machine and the JStyx libraries).

Styx Grid Services and workflows

SGSs can be composed into "workflows", in which a number of SGSs, perhaps in different locations, can be combined using very simple shell scripts to create distributed applications. Data can be streamed directly between the services along the shortest network path.

Let us consider a simple distributed application, consisting of two Styx Grid Services. The first is called calc_mean and reads a set of input files from a set of scientific experiments, calculates their mean and outputs the result as a file. The second SGS is called plot and it might be deployed in a completely different location from the first service. It takes a single input data file and turns it into a graph. The shell script (workflow) that would be used to take a set of input files, calculate their mean and plot a graph of the result would be:

calc_mean input*.dat -o mean.dat
plot -i mean.dat -o graph.gif

The important thing to note is that this is exactly the same script as would be used to run the programs if they were installed locally. The input files for each service have been detected and uploaded automatically and the output files have been automatically downloaded.

The intermediate file mean.dat can be passed directly between the two services (i.e. without being downloaded by the client) with a small change to the script:

calc_mean input*.dat -o mean.dat.sgsref
plot -i mean.dat.sgsref -o graph.gif

Getting Started

Download and install the JStyx software, as described on the downloads page. Then follow the Styx Grid Services tutorial.

Further reading

Here are some publications about the SGS system in (roughly) decreasing order of usefulness:

Jon Blower, Andrew Harrison, Keith Haines, Styx Grid Services: Lightweight, easy-to-use middleware for scientific workflows, accepted for oral presentation at the International Conference on Computer Science 2006, and for publication in Lecture Notes in Computer Science. [download paper]

Jon Blower, Keith Haines, Ed Llewellin, Styx Grid Services: lightweight, easy-to-use middleware for e-Science, Presented in the UK e-Science booth at SuperComputing, Seattle, 15-17 November 2005 [download presentation] [download poster]

Jon Blower, Keith Haines, Ed Llewellin, Data streaming, workflow and firewall-friendly Grid Services with Styx, Proceedings of the UK e-Science All Hands Meeting 19-22 September 2005 [download paper] [download presentation]