ESSC IT News
Rocks cluster trouble today
--
DanBretherton - 18 May 2012
Two Rocks compute nodes have hung in the past few hours, compute-0-0 late last night and compute-0-2 a few minutes ago. This is likely to be related to the unusually heavy load the cluster is experiencing at the moment, and I am taking steps to reduce the impact of the large number of batch jobs that are currently running and waiting to run. I will reboot the hanging nodes when I am confident that the new scheduler settings will stop the reoccurring.
New Scheduled Maintenance wiki page
--
DanBretherton - 18 May 2012
This
new wiki page will be used to show details of planned maintenance operations. All users will be notified by email when a new maintenance operation is scheduled, but there will be no furter notifications if the schedule is changed unless the planned scope of the operations changes significantly. Therefore, please check the
Scheduled Maintenance page regularly in the run up to any scheduled maintenance period, in case the likely impact on your work has changed from when the schedule was first announced.
I have scheduled a new maintenance operation this evening involving the Rocks cluster.
ESSC-Meteorology merger - REMINDERS
--
DanBretherton - 17 May 2012
Please remember to use
itmet@readingNOSPAMPLEASENOSPAMPLEASE.ac.uk as the email address (without the "NOSPAMPLEASE") for all IT support issues. I read these messages as they come in and respond if they relate to ESSC systems. If I am not available then another member of the IT team may be able to help. Thanks to all users (or "user", more accurately) who are doing this already. In order to make sure I spot ESSC specific queries quickly, please put "ESSC" in your message subjects.
Please remember that all ESSC backups will stop when odin is retired at the end of July, in accordance with the Meteorology department's IT
Service Level Agreement. For more information please refer to the
IT Merger section of the wiki, which contains a new section on backups that outlines various alternative solutions that users might wish to investigate for their own data.
E-Infrastructure Summer School
--
DanBretherton - 17 May 2012
If the ESSC clusters don't meet your needs then you might be interested in
this opportunity to learn how to use clouds, grids and other national e-infrastructure. Book soon if you want to attend, because they had 130 applications for 30 places last year apparently.
Bank holiday maintenance work
--
DanBretherton - 17 May 2012
The following merger related operations will be carried out over the bank holiday weekend, starting at 18:00 on Friday the 1st June.
- Restart of jupiter and saturn. Please refer to the Rocks Cluster user guide for instructions on how to keep cluster processes running when jupiter and saturn are restarted. Following their restarts, it will not be possible to log in to jupiter or saturn using ESSC accounts; University/ITS accounts must be used instead.
- Termination of ALL processes belonging to ESSC user accounts on the Rocks cluster and other compute and storage servers. University/ITS accounts will not be affected.
- Completion of ownership changes for data on the storage cluster and individual storage servers.
- Ownership of ESSC user accounts and home directories changed to match users' University/ITS accounts. The ownership of most other data will hopefully have already been changed by then. See "Data ownership changes" below for more information.
In preparation for these operations, please log out of all systems where you are using your ESSC account before 18:00 on the 1st June. Apart from the jupiter and saturn restarts, which I will try to get out of the way quickly on the 1st June, Linux desktop sessions and cluster processes belonging to University/ITS accounts will not be affected.
Please check the
Scheduled Maintenance page for updates to the schedule and the
IT Status page for progress updates.
Following these changes, it is strongly recommended that everyone uses their University/ITS accounts instead of their ESSC accounts. The only exception will be users whose data on other servers still belongs to their ESSC account. Hopefully there will not be anyone in this category, but you will be notified individually if the ownership of your data has not been completed by Wednesday the 6th June.
NEMO users' UNIX group
--
DanBretherton - 17 May 2012
One of the planned changes involved in the IT merger with Meteorology is the migration to Met controlled user and group management. One consequence of this change will be the loss of
automatically shared write access to files in WORK directories belonging to members of the "nemo" UNIX group. By "automatically shared write access" I mean that
new files created in WORK directories are automatically writeable by the "nemo" group. When this feature is removed it will still be possible to enable shared write access to any directory of your choosing, by changing its ownership to "nemo" (using
chown) and making it writeable by group (using
chmod). However, when the system change comes into force, new files created in these directories will be writeable only by their owners. This will not affect shared
read access, only write access. In preparation for this change I am going to start making two changes to "nemo" group members' data and user accounts immediately.
- Disable automatically shared write access by removing the line "umask 002" from each users' .cshrc file.
- Change the permissions of all files and directories not in the "nemo" group so that they are writeable only by their owners
- Change the group ownership of all files and directories not in the "nemo" group to "essc".
If automatically shared write access to new "nemo" owned files is still needed following these changes, users might be able to implement this facility themselves using Access Control Lists (ACLs). These are described in a
useful article by Neil Blanchonnet. Please note that ACLs have not been tested on ESSC's storage and compute servers and may not be a reliable solution.
Data ownership changes
--
DanBretherton - 16 May 2012
I have tried to schedule this several times already this year (see various updates on "More odin hardware maintenance") but I have been thwarted by hardware problems every time. We are now getting uncomfortably close to the retirement of odin at the end of July, so
the change of data ownership to match our University/ITS accounts is now an urgent issue. All the storage servers are working at the moment so I am going to proceed with the ownership changes over the coming evenings (after 18:00) and weekends. The work will proceed as follows.
- Data on the storage cluster and individual storage servers will be done first.
- ESSC home directories will be done during the coming bank holiday weekend. See "Bank holiday maintenance work" above for more information.
- Users with a relatively small number of files will have their data done first. Each user will receive an email informing them that the ownership change is about to start, and another one when the change has been completed. When the ownership of your data has been completed you should stop using your ESSC account and using your University/ITS account instead.
- If I see active tasks belonging to an ESSC user account running on the Rocks cluster or any of the individual storage or compute servers, I will postpone the ownership change of data belonging to that account until the following evening or weekend. However, idle shells, text editors and other applications will be terminated without warning. Therefore, until the ownership of your data has been changed, please remember to save your work every evening and close unwanted applications before 18:00. Additionally, please tell me if you intend to work after 18:00 or at the weekend, or if you particularly need to keep one or more idle processes running.
- In cases where a large number of files are involved (i.e 10^5 or more) the ownership change could take several days, possibly even weeks if there are millions of files. There is nothing that can be done to speed this up, and while the ownership change is in progress there will be some files belonging to the University/ITS account and some still belonging to the ESSC account. I am aware of a few users that fall into this category and I will leave their ownership changes until the bank holiday weekend. However, it is entirely possible that the ownership change for these users' data will still be going after the bank holiday. If you think you are in this category and there are particular directories you would like to be changed first, please discuss this with me as soon as possible.
Wireless network upgraded - UPDATE
--
DanBretherton - 16 May 2012
Thanks to everyone who responded to my request for feedback on the quality of the wireless network following the installation of two new WAPs. The main problem is conflict between neighbouring WAPs in different departments, which causes the signal strength to vary significantly over time in some parts of the building. This is a known problem that also affects other buildings on campus, and IT Services have been making modifications to the central wireless network controller in an attempt to stop it from happening. Therefore it would be useful to have some more feedback in order to find out if things have improved. If you haven't tried connecting to the wireless network recently please try again, and let me know if the signal strength is any more stable than it was before. Please refer to the
wireless network section of the wiki for instructions on how to do connect.
Backup tape library not working- UPDATE
--
DanBretherton - 10 May 2012
A restart of odin is required to solve the SCSI bus problems that have been affecting the tape library. I am going to restart odin at 4PM, which may cause Linux desktops and applications to freeze for a minute or two. The backups have been working for the past two days but I have been told by an expert at Oracle that it may stop working again unless the server is restarted.
Backup tape library not working
--
DanBretherton - 04 May 2012
The backup tape library has stopped working and no backups will be performed until further notice. Please check the
IT Status page for updates. There have been intermittent problems for over a month and almost every part in the library has been replaced including the chassis itself. I am waiting for advice from Oracle but I suspect another site visit will be required, which won't happen until next week. Therefore, please make sure that all important documents and source files are stored in University/ITS home directories instead of ESSC home directories from now on.
Recent storage cluster disruption - UPDATE
--
DanBretherton - 02 May 2012
The hardware problems with one of the storage nodes (bdan12) are still causing periodic disruption to storage cluster access (see IT News, 3nd April). To solve the intermittent mount failure problem that has been affecting the Rocks cluster compute nodes for the past couple of days, all the volumes be taken off line for a few minutes at 13:00 today. The work is expected to take only a few minutes but some running processes might hang or be killed. Please check the
IT Status page for updates.
The bdan12 hardware problems were solved at the weekend, but the process of checking filesystems and re-synchronising will take several more days.
Rocks cluster: medusa restart - UPDATE
--
DanBretherton - 28 Apr 2012
In order to reduce the risk of programs being accidentally run on medusa instead of one of the compute compute nodes, medusa can now only mount users' home directories. No storage cluster volumes are mounted. This information has been added to the Rocks Cluster section of this wiki. Incorrect use of medusa contributed to the problem with Grid Engine that developed yesterday and led to a restart being required.
All interactive sessions were terminated when medusa restarted except
screen shells. Some batch jobs were also terminated, which wasn't expected. In order to find out why
it will be necessary to restart medusa once again, and I plan to do this late on Sunday evening when a series of important jobs has finished.
Rocks cluster: medusa restart required this evening
--
DanBretherton - 27 Apr 2012
Unfortunately I need to restart medusa in order to fix a problem that is preventing some important batch jobs from running over the weekend. Batch jobs currently in progress should be unaffected, but interactive command prompts on compute nodes started with qrsh will be killed. I'm not sure what will happen to gnome-terminal windows started with qlogin; they might stay running but be prepared for them to be killed as well.
I am going to restart medusa at 18:00.
Remember to use batch mode or
screen to avoid background processes on compute nodes being killed when interactive shells or terminal windows are terminated. For more information see the 5th April IT News update "Rocks Cluster user guide updated".
One of the symptoms of medusa's problem is batch jobs and interactive sessions starting on the test node, compute-0-4. If you have any processes running on this node please can you kill them.
Merger related changes to scientific software applications
--
DanBretherton - 27 Apr 2012
Please read the new information about Scientific Software Applications in the
IT Merger section of this wiki. Please start using the Meteorology department's applications as soon as possible, to make sure that all the problems are solved before odin is retired at the end of July.
Meteorology account holders: decisions required - UPDATE
--
DanBretherton - 25 Apr 2012
The requested changes to home directories and shells were applied in Meteorology this morning and will be applied in ESSC after 18:00 this evening. This will only affect new logins with University/ITS accounts.
New user account registrations
--
DanBretherton - 25 Apr 2012
The Meteorology IT team is now in charge of University/ITS user account management. New members of staff and supervisors of new students must download and fill in the Met Department Account form on the
Met Department's forms web page, and send copies to both Dan (or just send me an email with the details) and the
Meteorology IT team. Unfortunately user accounts have to be maintained separately for Meteorology and ESSC because the same type of service (NIS) controls both user accounts and auto-mount.
Serious fault on odin - all data unavailable - UPDATE
--
DanBretherton - 19 Apr 2012
All the data on odin is accessible again and I have changed back to the normal home directories for ESSC accounts (i.e. /users/<USER_ID>). The problem affecting University/ITS accounts was caused by odin becoming even more unhappy and dropping off the network completely for an hour or so. The disruption also affected the print service on sweeney but that is also now working again. However the underlying hardware fault on odin has not yet been located so there might be some more disruption. Please check the
IT Status page for updates.
Serious fault on odin - all data unavailable
--
DanBretherton - 18 Apr 2012
Here we go again. Please check
IT Status page for updates. This fault is probably related to the tape drive I replaced five minutes before so I am hoping there might be a quick SCSI related fix. I am waiting for Oracle to respond to my service request. In the meantime I have reverted to our temporary home directories on the storage cluster to enable access to the cluster data. Please use the work-around described in the IT News update "More odin hardware maintenance - UPDATE" on the 16th February.
NX users should log in to jupiter and saturn using their University/ITS accounts rather than their ESSC accounts, otherwise performance and responsiveness is likely to be very poor. No ESSC software packages are available this time, but the Meteorology packages are available via
setup in the Korn shell on the Rocks cluster compute nodes; enter command
ksh to start a new Korn shell. For the Met. department's
setup command to work you must have the standard .kshrc and .profile in your temporary home directory. I copied these when I created the temporary homes last time, but if for some reason you don't have these files they can be copied using the commands below.
cp -f /opt/metadmin/config/env.profile ~/.profile
cp -f /opt/metadmin/config/env.kshrc ~/.kshrc
Please note that we have not yet combined our software licenses with Meteorology so
please use software applications sparingly until odin is fully functional again, especially MATLAB, and close down all applications when not in use.
Meteorology server room maintenance on the 17th April - REMINDER
--
DanBretherton - 17 Apr 2012
Please remember that the following systems will be shut down from 16:00 today, in preparation for a power cut in the Meteorology area of the GU15 machine room.
Please also remember the network disruption in ESSC between 18:00 and 20:00. NX users can use jupiter instead of saturn while saturn is off line, and thin client users can also connect directly to medusa and other ESSC servers from xterm windows.
For further details see previous IT News updates on this subject.
WebVPN instructions for Ubuntu Linux
--
DanBretherton - 17 Apr 2012
Antony Worrall in the School of System Engineering kindy sent me a link to their
WebVPN configuration instructions for 32-bit and 64-bit versions of Ubuntu Linux. Further information about off-site access to ESSC can be found
here.
Meteorology server room maintenance on the 17th April - REMINDER AND UPDATE
--
DanBretherton - 16 Apr 2012
I have just been informed by IT Services that the power cut in GU15 tomorrow will also affect the ESSC network. In addition to the systems and services already scheduled for down time (see 3rd April IT News update)
there will be no network communication between ESSC and the outside world (including the rest of the university) between 18:00 and 20:00 on Tuesday the 17th April. Network communication within ESSC should be considered
at risk during this period although it is expected to remain active. For the full list of affected systems please see the IT News update on this subject from the 3rd April.
Adobe Creative Suite licenses - REMINDER
--
DanBretherton - 13 Apr 2012
Please can alll Windows and Mac OS X users tell me whether or not they need to have Adobe Creative Suite installed by Monday the 16th April at the latest. Please see the previous IT News update on the subject (3rd April) for further details
Wireless network upgraded
--
DanBretherton - 13 Apr 2012
Two new wireless access points (WAPs) have been installed in an attempt to cover all parts of ESSC. One is in the computer lab and the other is in the upstairs printer room, and the original WAP has been moved out into the corridor outside the conference room in order to provide coverage on both sides of the bend. The signal strength does vary quite a lot in different parts of the building, but the important thing is that it should be strong enough to allow everyone to work effectively via the wireless network if necessary. Therefore, please can everyone with a laptop test the wireless network and provide me with some feedback. I would appreciate some feedback from every office, even if there are enough cable network connections to make use of the wireless network unnecessary. The wireless network must be effective in order to avoid expensive cable network upgrades in the future. The most important parameter is the connection speed, which ideally should be 54Mbit/sec but can usually drop down to 36Mbit/sec without causing any noticable problems. I particularly need to know if the reported speed drops to 24Mbit/sec or lower, but
please make sure that your wireless network adapter is using the 802.11g protocol and not 802.11b, which is limited to 11MBit/sec. Please refer to the
wireless network section of the wiki for instructions on how to do this.
One way to test the effectiveness of the connection is to watch a streaming video broadcast such as a webinar or BBC IPlayer. You might notice variations in signal strength over time, which are likely to be caused by interference from access points in neighbouring departments. This is a known problem, and if the effect is severe then there are things that ITS can do to reduce it. Any comments on the effectiveness of the wireless network would be much appreciated.
Meteorology account holders: decisions required
--
DanBretherton - 12 Apr 2012
The following applies to every ESSC user who has an account in the Meteorology department with a user ID that matches their University/ITS user ID. NCAS account holders with NCAS style user IDs are not affected.
As everyone is already aware (I hope), we will soon be using our University/ITS user accounts to log in to Linux and UNIX servers at ESSC. At the moment, ESSC users with Met department accounts have a different home directory when they log in to Met department servers. However, it would be simpler if everyone just had one home directory on all servers in the merged department, and this will actually become a technical necessity at some point in the future. Therefore, we would like to ask all ESSC users with Met department accounts to decide which home directory they wish to use from now on. The choice is between the new University/ITS home directories (which are the same as the University
NetDrives or "N-drives") and the Met department home directories; only one can be mounted in /home/<USER_ID> for each user.
If you are a Met department account holder please decide which home directory you would like to use as soon as possible, and
copy all the data you wish to be in your home directory via scp or sftp. The directory you choose will be used when you log on to ESSC and Met department servers. The University
NetDrive is probably the best choice because it provides hourly, nightly and weekly snapshots (i.e. backups) and can be accessed remotely via any web browser. If you decide to use your
NetDrive as your home directory your current Met department home directory will be deleted, and when this happens is up to you to discuss with the Met department IT team. If you choose this option, please copy your home directory data to your
NetDrive (mounted as /home/<USERID> in ESSC) as soon as possible and
tell the Met department IT team when you have done it. If you prefer to use your Met department home directory instead, your
NetDrive will continue to be accessible through web browsers or from Windows via "Map network drive".
If you want to keep your current Met department home directory you must tell the Met department IT team as soon as possible.
Another choice to make is which shell environment to use by default. C-shell is the default for most ESSC users but only the Korn shell is supported in Meteorology. The new accounts have been populated with Met department Korn shell login scripts as well as ESSC style Bash and C-shell scripts, so it actually it doesn't really matter what your default shell is. For example, if your default shell is the Korn shell and you want to use the ESSC
setup commands, all you have to do is start a new C-shell with the command
tcsh. The opposite is also true; if C-shell is your default you can use the Met style
setup by starting a new Korn shell with
ksh.
We need answers fairly quickly because of planned work to integrate saturn into the Met department's automatic server configuration and update mechanism. Therefore, please can all Met department account holders please reply to
itmet@readingNOSPAMPLEASE.ac.uk with answers to the following questions as soon as possible, and
no later than Friday the 23rd April.
1) Do you want to use your Met. department home directory or your University/ITS
NetDrive home directory when you login to ESSC and Met department servers? A null response means "I would like to use my University/ITS
NetDrive as my home directory, and I have already copied my data to my
NetDrive so it is safe to delete my Met department home directory whenever you like".
2) Which default shell do you want to use when you login to ESSC and Met department servers? The choice is between Bash, the C-Shell and Meteorology supported Korn shell. A null response means "I would like to use the Korn shell as my login shell on ESSC and Met department servers".
Rocks Cluster user guide updated
--
DanBretherton - 05 Apr 2012
The main changes to the
ESSC Rocks Cluster user guide concern interactive use of compute nodes, with warnings about long running tasks being vulnerable to loss of connection and suggestions for protecting them against accidental termination. The techniques described are useful if you want your cluster jobs to carry on running when jupiter, saturn or medusa are restarted.
There has also been a change to the use of batch mode, which now requires that "-l batch" is supplied as an option to
qsub. This limits batch jobs and array jobs to one instance per physical CPU core, in order to avoid too many jobs trying to start at once and overloading the storage cluster I/O capabilities.
Data access changed on saturn
--
DanBretherton - 05 Apr 2012
I have not been able to make ESSC-style auto-mount or access to the ESSC storage cluster work properly on saturn, the Meteorology flavoured remote desktop and Sunray terminal server. Therefore, I have decided to stop saturn from mounting cluster volumes altogether, which should prevent NX logins from timing out and active sessions from becoming slow and unresponsive. To access storage cluster volumes and individual data servers from an NX remote desktop session on saturn or from a Sunray terminal, please first connect to a Rocks Cluster compute node via SSH, as described in the
Rocks Cluster section of this wiki.
PLEASE SUSPEND/SHUT DOWN AND TURN OFF EVERYTHING YOU CAN BEFORE GOING AWAY FOR EASTER
--
DanBretherton - 03 Apr 2012
Power saving guidlines, including information about suspending and hibernating computers and thin clients, can be found in the
Green Impact wiki web.
Recent storage cluster disruption
--
DanBretherton - 03 Apr 2012
The disruption to storage cluster access during the past two weeks has been mainly caused by one of the storage servers being off line (see 22nd March
IT Status update). All the data has remained accessible, but there have been frequent periods of poor performance and lack of responsiveness caused by communication with the off line server being attempted and then timing out. The new network cables that I fitted recently have all been tested and are functioning normally, so these have been ruled out as a possible cause of the disruption. I am expecting to have the new disk drives for bdan12 before Easter, which will hopefully enable me to re-synchronise the off line storage bricks with their mirrors during the Easter holiday.
Easter maintenance work
--
DanBretherton - 03 Apr 2012
I am planning to carry out the following operations over the Easter holiday, starting after 6PM on Wednesday the 4th April. Please check the
IT Status page for progress updates.
- Restart saturn, to activate a change to the storage cluster access mechanism that will bring it into line with jupiter and the Rocks compute nodes.
- Restart jupiter to apply system updates.
- Re-synchronise the storage bricks on bdan12 that are currently off line due to multiple failing disk drives (see "Storage cluster disruption" above). This might cause some disruption while it is going on but I will try to keep it to a minimum.
- Change the ownership of everyone's data to match the new University user accounts. This depends on the storage server bdan12 being back on line and fully synchronised in time. It is difficult to predict how long this process will take and it might not be possible to complete everyone's accounts during the university closure period. I will do my best to make sure that nobody's data is only half finished when we return to work on the 11th, and I will notify everyone individually once the ownership of their data has been completed. If you don't hear from me it means your data has not been changed. If you need to use your data over the holiday then please let me know as soon as possible, giving me an alternative weekend period as an alternative. If you don't want me to change the ownership of your data over Easter but don't specify an alternative weekend I will choose one myself.
Meteorology server room maintenance on the 17th April
--
DanBretherton - 03 Apr 2012
All power to the Meteorology area in the GU15 server room will be cut at 5PM on Tuesday the 17th April, in preparation for the installation of the JASMIN storage and compute cluster. This will affect the following ESSC systems, which will be shut down between 16:00 and 17:00 on the 17th.
The ESSC rack will have to be moved as part of the JASMIN installation work, and we plan to attempt this on the Wednesday morning before powering up the servers again. The trouble is that it weighs a ton and hasn't got any wheels, so it might be necessary to remove most of the servers to avoid giving ourselves hernias. Therefore, I can't predict how long it will be before behemoth and the
ClusterVision cluster are back on line and it might not be possible to complete the work before closing time (5PM).
As usual, please check the
IT Status page for progress updates.
Adobe Creative Suite licenses
--
DanBretherton - 03 Apr 2012
We recently discovered that ESSC only has a single computer license for the Adobe Creative Suite. To save Kevin from being arrested we are going to buy a license for everyone who needs to use the software regularly, but at £277 per license we don't want to buy more than we need. The software is installed on the Windows PCs in the computer lab and on one of the spare laptops, and occasional users will be able to use one of these computers rather than having it installed on their own PC. Please tell me as soon as possible if you use Adobe Creative Suite regularly and would like to keep it installed on your PC.
The deadline for requesting an Adobe Creative Suite license is Monday the 16th April. A null response means the following: "Please uninstall Adobe Creative Suite from my PC because I am happy to go into the computer lab or borrow the spare laptop when I want to use it".
Canon copier can now send scanned documents by e-mail
--
DanBretherton - 03 Apr 2012
The Canon photocopier in the computer lab can now send scanned documents by e-mail instead of printing hard copies. This is the easiest way to use the device as a scanner, but the Microsoft Clip Organiser method is still useful if you need more control over how documents are scanned. Brief instructions for both ways to use the Canon as a scanner are in the
Scanner section of the wiki.
Data access testing on Rocks Cluster - UPDATE
--
DanBretherton - 03 Apr 2012
I have ended the data access test and all the compute servers are now accessing the storage cluster via NFS. Thanks to everyone who gave me feedback on the relative performance of their applications. The qlogin and qrsh commands now enable access to all four of the Rocks Cluster compute nodes, and no special paths are required to access the storage volumes via NFS. Storage cluster paths for Linux and Windows are described in the
storage cluster section of this wiki.
University account home directories are now 10GB in size
--
DanBretherton - 19 Mar 2012
The size of our University account home directories have been increased to the standard 10GB, as specified in the Meteorology Department's IT
Service Level Agreement. Please migrate all your important documents, scripts and source code to your University account as soon as possible, and certainly no later than the end of July when odin is due to be retired from service. The data in the ESSC account home directories will still be available when odin is taken out of service but it will no longer be backed up. Please refer to the
IT Merger section of the wiki for more information.
Konica Minolta printer kmcolour2 now prints double sided
--
DanBretherton - 16 Mar 2012
The "new" Konical Minolta colour printer downstairs,
kmcolour2, can now do double sided printing after being fitted with a duplexer. The print server settings have been altered to make double sided printing the default for this printer, but Windows and
MacOS printer settings might have to be altered separately (depending on whether or not the computer is configured to print directly to the printer). To change the default printing options for a printer in Windows, open "Printers and Faxes" (XP) or "Devices and Printers" (Vista/7). Right click on the printer you want to alter and select "Printing Preferences". The double sided printing option might be in a tab or section called "Finishing" or "Layout". Please ask Dan if you need help.
The only printer at ESSC that does not support double sided printing now is
hpcolour upstairs. This printer will be withdrawn from service as soon as the last spare toner cartridge has been used up, and we are down to the last one.
More odin hardware maintenance - UPDATE
--
DanBretherton - 01 Mar 2012
Odin is now working normally again, having passed a thorough I/O stress test that ran all day yesterday and over night. Write access to ESSC home directories has been enabled.
I will shortly begin the process of changing data ownership to match the university/ITS accounts, as described in the
IT Merger section of the wiki.
More odin hardware maintenance - UPDATE
--
DanBretherton - 28 Feb 2012
1) I have just heard that a field engineer will be coming to ESSC this evening at 18:30. The planned work is likely to require odin to be shut down, and therefore ESSC home directories and other data on odin should be considered at risk from 18:30 until further notice.
2) Windows access to ESSC home directories has stopped working because of the temporary change of home directories (although it carried on working for several days after that change for some unknown reason). Until odin is fully functional again, the home directories can be mapped as Windows network drives using the following alternative path.
\\odin.nerc-essc.ac.uk\users
Use your ESSC username and password as normal. Once connected look for a folder matching your ESSC username.
3) To map your temporary home directory in Windows, follow the instructions for mapping storage cluster volumes in the
storage cluster user guide.
4) To map your University/ITS home directory in Windows, follow the instructions under "Migrating to University accounts" in the
IT Merger section of the wiki.
ESSC-Meteorology merger update
--
DanBretherton - 28 Feb 2012
I have updated the
IT Merger section of the wiki, with a new section about migrating to the University/ITS user accounts from ESSC accounts on odin. Please read this information carefully, particularly with regard to the amount of data that will be backed up under the terms of the
Meteorology Department IT Service Level Agreement (SLA).
More odin hardware maintenance - UPDATE
--
DanBretherton - 24 Feb 2012
An Oracle field engineer has just arrived and we may have to shut down odin at short notice. I don't know when this will be exactly, but please be aware that paths beginning with /users may be unavailable at times this afternoon. Hopefully this will lead to a permanent solution to odin's I/O problems. Please check
IT Status page for updates.
Urgent Storage Cluster Maintenance affecting "atmos" volume
--
DanBretherton - 16 Feb 2012
I sent a note out to ESSC Clusters mailing list group members about this yesterday evening, but I have just realised that some members' email settings were set to "Digest" or "No email". I am sorry for this error in notifying you of the following problem with the "atmos" storage volume. Here is the content of yesterday afternoon's mailing list posting for those of you who didn't receive it. Just to be clear, the volume went read-only yesterday evening and I am hoping to enable write access later on today or tomorrow.
Dear All-
I will have to take the atmos volume off line this evening for urgent
maintenance. I will start work at 18:00 or soon after. Due to the
large number of files in the volume it could take up to 48 hours
complete the required operations. I am going to try making the volume
available for reading during this time but you might find that some
files give I/O errors when you try to open of copy them. For those of
you who are using /glusterfs/atmos as your temporary home directory, I
will find another temporary home directory which should be available
by tomorrow morning. I realise this is the worst possible timing
given the odin situation, but unfortunately the atmos volume
maintenance really can't wait. The paths affected are as follows.
/glusterfs/atmos (=/data/atmos)
Note that some subdirectories of /data/remus, /data/scratch, /data/
gorgon and /data/cadfael are symbolic links to parts of the atmos
volume.
Please check the IT Status page at the URL below for progress updates.
http://www.resc.rdg.ac.uk/twiki/bin/view/ESSCComputing/CurrentStatus
-Dan.
More odin hardware maintenance - UPDATE
--
DanBretherton - 16 Feb 2012
Recent news from Chile is not encouraging, but the good news is that I have found a temporary work-around for the file ownership problem. If you are a regular Linux/UNIX user your ESSC home directory is now your main directory on the storage cluster. For example, mine is /glusterfs/marine/users/dab. These are not new directories and no data has been transferred from the normal home directories on odin (i.e. /users/<YOUR_ESSC_ID>). This is just a temporary measure to allow you to run applications on the compute servers using your ESSC account while odin's data is still read-only.
The temporary home directories will not be backed up. Until odin is fully operational again, any new files you wish to have backed up must be saved in your University/ITS home directory (i.e. /home/<ITS_ID>). Only new files need to be saved in your Uni/ITS home directory to enable backups to take place; the read-only ESSC home directories on odin are still being backed up to tape every day.
I have populated the temporary home directories with the standard login scripts, but you may wish to copy over your usual scripts from /users/<YOUR_ESSC_ID>. It is possible to connect to jupiter and saturn via NX using your ESSC account, but when I tested this earlier I found performance to be annoyingly slow. (
GlusterFS was not designed for home directories, but performance for this type of usage should improve when our new fancy network switch arrives). Therefore I strongly suggest using your new University/ITS account for NX, and only using your ESSC account for logging on to medusa and then on to the compute nodes. For example, I would enter the following command on jupiter to log on to medusa as ESSC user dab.
[sms05dab@jupiter ~]$ ssh -fY
dab@medusa gnome-terminal
The "setup" command now works again on all the compute servers.
Please note that file ownership will still need to change to match the University/ITS user accounts, but given the above work-around you may wish to reconsider your schedule. However, if I don't hear from you by the end of this week I will still assume that you are happy for me to change your file ownership during any weekend (starting on a Friday unless notified otherwise).
More odin hardware maintenance - UPDATE
--
DanBretherton - 15 Feb 2012
An Oracle support engineer in Chile is looking after our technical support request! Unfortunately we will have to wait until this evening before he can action a hardware repair, and he's not yet sure what the problem is.
I have enabled read-only access to odin's data via NFS and Samba (i.e. Windows). It isn't possible to login via NX to ESSC user accounts, but I have set up Linux access to University accounts which mount our N-drives as home directories. These are currently only 3GB in size, but will be extended to 10GB when ITS has reorganised the
NetApp storage to accomodate the larger drives for us. We will be using our University accounts for Linux access permanently when odin is retired from service at the end of July. The ESSC home directories can be accessed at /users/<ESSC_USER_ID> as usual, and once connected to jupiter or saturn it is possible to run commands in your ESSC account by entering the command "su - <ESSC_USER_ID>". Note the spaces either side of the '-' symbol. This command will not work on medusa or any of the Rocks Cluster nodes.
The Rocks Cluster and compute servers nemo1-3 are fully operational. Remember that "qlogin" or "qrsh" currently only give access to compute-0-0 and compute-0-1, as explained in the "Data access testing on Rocks Cluster - UPDATE" IT News update on the 12thFebruary. To log on to compute-0-2 or compute-0-3 use commands "qlogin -l nfs" or "qrsh -l nfs". See
Rocks Cluster user guide for more information.
The data on the storage cluster and individual storage servers is writable as usual, but the ownership will have to be changed to match the new user accounts. Unfortunately only root can change file ownership so I will have to do it for you. A quick way to start your new user account would be to have the ownership of your directories changed first. This would allow you to create new files and directories but not to delete or modify existing ones, and normally only takes a few hours at the most.
Please tell me when you would like the ownership of your data changed, and whether or not you would like the directory ownership to be changed first. Bear in mind that it can take several days to change the ownership of thousands of files, and remember that once the data belongs to your new account it will not be possible to delete or modify files or directories from your old ESSC account. When the ownership of all your data has been changed, the UID of your ESSC account will be changed to match the UID of your University account, and then both accounts will be able to modify the data. Before odin is retired at the end of July, user account control will be transferred to Meteorology's NIS server. The ESSC accounts will not be transferred but the old home directories will remain available.
If I don't hear from you THIS WEEK I will assume you want me to change the ownership of all your data as soon as possible. I will start this on a Friday afternoon and notify you by email when the process is about to start.
More odin hardware maintenance - UPDATE
--
DanBretherton - 13 Feb 2012
- odin will be shut down briefly again TODAY at 13:00
- odin will be shut down for up to one hour TOMORROW at 12:30
Data access testing on Rocks Cluster - UPDATE
--
DanBretherton - 12 Feb 2012
The NFS data access testing on the Rocks Cluster has been extended to compute-0-2, in addition to compute-0-3 as announced in the "Data access testing on compute-0-3" news article on thh 6th February. The compute servers nemo1, nemo2, nemo3 are also now mounting all storage cluster volumes via NFS. Further details can be found in the
Rocks Cluster User Guide under the heading "Data Access".
More odin hardware maintenance
--
DanBretherton - 10 Feb 2012
There will be a short interruption to odin's services on Monday the 13th February at 13:00, to enable it to detect its new hard drive. The work should only take a few minutes and may involve another restart. The following services will be affected.
- Home directories (i.e. /users/<USER_NAME>)
- /data/scratch, /data/cadfael
- Scientific software (e.g. MATLAB, IDL)
- New network connections (e.g. laptops)
- Internet access
Please check the
IT Status page for progress updates.
Odin hardware maintenance this week- UPDATE
--
DanBretherton - 09 Feb 2012
Odin will be restarted TODAY at 12:00 in order to restart several services (including backups) that crashed last night. If all goes well there should only be a short pause in access to home directories and other services, and all running applications should continue as normal when odin is running again.
The Oracle hardware engineer will be coming to ESSC to repair odin TOMORROW MORNING (Friday) at 10:00. Odin's services (see below) should be considered at risk until the work has been completed. I don't know exactly how long this will take but I expect it will be finished by lunch time. Please check the
IT Status page for progress updates.
Please note that user authentication should continue to work when odin is down but home directories will not be accessible. This might cause NX desktops to freeze, but HP thin client users can continue to make SSH connections using the built in xterm application as described in the
Thin Client User Guide. The built in web browser and connections to prufrock will also continue working.
Odin hardware maintenance this week
--
DanBretherton - 06 Feb 2012
I am trying to arrange for two hardware faults on odin to be repaired by Oracle engineers this week. I am hoping to arrange for both faults to be repaired at the same time to avoid shutting down odin twice. However, arranging this with Oracle is proving to be a real struggle so I can't make any promises. I will try to give as much notice as possible before odin is shut down, but I can only give as much notice as Oracle gives me. I can't put the repairs off any longer, so if they insist on two separate engineers' visits then odin will have to be shut down twice I'm afraid.
When odin has been shut down the following data and services will be unavailable.
- Home directories (i.e. /users/<USER_NAME>)
- /data/scratch, /data/cadfael
- Scientific software (e.g. MATLAB, IDL)
- New network connections (e.g. laptops)
- Internet access
Please check the
IT Status page for progress updates.
Data access testing on compute-0-3
--
DanBretherton - 06 Feb 2012
A new way of mounting storage cluster data on the Rocks compute nodes is being tested on compute-0-3. All the data is now mounted via NFS rather than using the
GlusterFS client as before. Recent tests have shown that many applications run a lot faster via NFS, but previous attempts to use NFS failed because a single storage server couldn't cope with the NFS load for the whole cluster. I have now come up with a way to balance the NFS load more evenly, and I would like as many people as possible to test their applications on compute-0-3. To login to the NFS test node (compute-0-3) instead of the other Rocks compute nodes, the following command should be entered on medusa.
qlogin -l nfs
The character in the flagged option is the lower case 'L', not the figure '1'. Alternatively, if you normally use qrsh to login to the compute nodes you can do "qrsh -l nfs" instead.
When you have tested your applications on the NFS node, please tell me if there were any problems accessing the data, and whether or not the applications ran any faster. If the tests are successful during the next month I will set up load balanced NFS mounting on all the nodes.
New applications on jupiter after system update
--
DanBretherton - 22 Jan 2012
The latest version of Firefox and Thunderbird have been installed and the Google Chromium web browser is now available. Chromium can be launched from the "Network" section of the XFCE Main Menu, or by using the command "chromium".
LibreOffice has replaced
OpenOffice , as described in a new section of the
NX Remote Linux Desktops section of this wiki web. The new section explains how to open office documents from applications such as Firefox and Thunderbird if the file associations are not set up properly.
URGENT: Home directory data transfers and other heavy I/O operations
--
DanBretherton - 10 January 2012
Please don't perform heavy I/O operations involving the Linux/UNIX home directories. These operations include the transfer of large amounts of data (i.e. several gigabytes or more) and the use of home directories for input to, or output from models or data analysis applications. This type of usage has a serious impact on performance, in particular the usability and responsiveness of Linux desktops and interactive applications. The load on odin and other servers can be seen in the
Ganglia monitoring charts. Click on a chart to see more details of that particular machine, including system CPU load (coloured red on the CPU chart) and network load, both of which increase noticably when the NFS load increases.
Some heavy I/O operations that started today at about 13:00 have made Linux desktops run very slowly and prevented several users from logging on. The storage cluster is the approriate resource for I/O intensive applications. Please come and talk to me if you don't currently have a directory on the cluster.
Thin client user guide
--
DanBretherton - 10 January 2012
A
user guide covering the HP thin clients and Sunray terminals is now available on the ESSC Computing wiki.
IT News from previous years