Parallel Processing and Cluster Management at ESSC

Introduction to High Performance Computing Clusters

The Environmental Systems Science Centre (ESSC) recently took delivery of a high performance computing (HPC) cluster from Netherlands based specialists ClusterVision. The word "cluster" means any collection of independent computers that are brought together to form a single computing resource that is more powerful in some way than any of its individual elements. HPC clusters are designed for applications that require high processing speeds, large amounts of memory or both. The ESSC cluster is one of a class of HPC clusters known as Beowolf clusters, first developed by the Beowulf Project in 1994. Beowulf clusters provide low cost access to HPC resources by utilising commodity, off-the shelf (COTS) hardware components and (usually) free, Open Source software such as the Linux operating system.

ESSC Cluster Hardware and Operating System

The specifications of the main components of the ESSC cluster are summarised in the following table.

Component Specification
1 head node:
No. of processor cores 2
Processor description 2 x AMD Opteron 248 (64-bit)
Processor speed 2.2 GHz
Memory size 2 GB
Hard disk size 3 TB
User interface Keyboard, mouse & monitor
16 processing nodes:
No. of processor cores 4
Processor description 2 x AMD Opteron 275 (Dual Core, 64-bit)
Processor speed 2.2 GHz
Memory size 8 GB
Hard disk size 80GB
Operating system (All nodes) ClusterVision OS 2.1 (based on OpenSuSE 10.0)
2 networking switches:
Myrinet  
Gigabit Ethernet  
Un-interruptable Power Supply APC

The cluster nodes are rack server units, which are housed in a standard sized server rack, along with the two networking switches and the un-interruptable power supply (UPS). The head node co-ordinates the activities of the 16 processing nodes and is the only node that is connected to the outside world via the ESSC local area network. The processing nodes have a total of 64 processor cores and 128 GB of memory.

Cluster Management

  • The ESSC cluster management system is Sun Grid Engine (SGE)
  • We use load-based scheduling, which means that the choice of nodes to be allocated to a waiting job is based on their relative computational load.
  • Time sharing is implemented by limiting the total run time of every job to three hours. Jobs that need more time than this must be split into three hour chunks, which ensures that no job has to wait more than three hours before being allowed to run. If two or more jobs are waiting when some nodes become available, the nodes are allocated to the job that has been waiting the longest.
  • A job requiring a large number of processors can reserve the nodes it needs when it is submitted. To avoid reserved nodes being idle for long periods, SGE uses a technique called back-filling, where a job can run on reserved nodes if SGE knows that the job will definitely finish before the reserving job is due to start.

Future Plans

One way to improve the way that cluster resources are shared is to link up two or more clusters to form a grid, in effect a cluster of clusters. We are planning to set up a cluster grid in association with some of our partner institutions. The National Grid Service (NGS) is an example of a very large grid of processing and data storage resources. Cluster grids provide several advantages compared to keeping the clusters in different institutions separate. For example, if one cluster in the grid is better suited to long jobs and another to short jobs, conflicts between the two job types can be reduced by submitting jobs to the appropriate cluster in the grid. Another cluster in the grid may be better suited to High Throughput Computing (HTC), which seeks to maximise the number of small jobs completed over a long period of time. A Condor pool consisting of a set of networked desktop computers is a good example of a HTC cluster; a Condor pool such as the Reading Campus Grid could be used to run all the jobs in the cluster grid that can fit onto one computer, leaving the more expensive HPC clusters free for jobs that require two or more computers at the same time. The most effective way to manage a cluster grid is to perform job scheduling using similar strategies to those used for scheduling within an individual cluster. We are currently investigating various ways of achieving this.

-- DanBretherton - 14 Aug 2006

Topic revision: r3 - 14 Aug 2006 - 14:47:00 - DanBretherton
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback