Shutting down and restarting gorgon

Gorgon, the master node, can be shut down and restarted using the normal Linux "shutdown" command. A restart can also be initiated at the console by pressing CTRL-ALT-DEL. If you try this, note that NO WARNING IS GIVEN before the restart takes place. Presumably this also works for the processing nodes. Gorgon can be restarted without affecting the processing nodes, but any jobs running on the cluster at the time will be lost, of course. When gorgon has restarted, the SGE services should return to normal without intervention. The only thing that might need to be restarted manually is the UMUI service. This service should now start automatically, but the automatic start has not been tested at the time of writing.

Starting the UMUI service manually can be accomplished by running the following command as user um:

/home/users/um/umui/umuictl/umuictl.tcl start

It is important to run this command as user "um" rather than as root. If it accidentally gets started as root, stop the service with" umuictl.tcl stop " and change the ownership of the UMUI database back to um:essc with the following command:

chown -R um.essc /home/users/um/umui/umui2.0/DBSE

After that it is safe to start the service again as user "um".

Shutting down and restarting nodes

The cluster adminstration tools (pshutdown, ppoweroff etc.) are described in the ClusterVision Adminstrator Manual. Here are some supplementary tips:
  • If a node has to be powered off, it must be shutdown first. In other words, always do a pshutdown before using ppoweroff.
  • When ppoweron is used to switch on the power to a node, the node automatically boots up.
  • The pshutdown commnand only works when the node is running Linux and responding to pings.
  • When a node has been restarted it will often not be able to see one or more of the other nodes through Myrinet. See "Correcting Myrinet mapper problems" below.
  • Using one of the "p" commands (i.e. pshutdown etc) without specifying any nodes will result in the command being applied to all the slave nodes.
Special care must be taken when using ppoweroff and ppoweron with the nodes that share power outlets with new nodes 101-104. These are listed in the table in the PDU section above.

Special care must also be taken when starting up all the slave nodes at the same time. See the "Current Issues" entry for the 24th September 2007 for details.

-- DanBretherton - 17 Jul 2009

Topic revision: r1 - 17 Jul 2009 - 17:27:45 - DanBretherton
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback