Berg




Berg is a linux-based, 8.75 TFLOP SGI Altix ICE cluster consisting of 70 compute nodes.


HARDWARE

The machine lives at the Begbroke Science Park.
The are 70 compute nodes and 2 "service/login" nodes (used for compiling & running short programs).
The login nodes are called berg1.physics.ox.ac.uk & berg2.physics.ox.ac.uk. You can only ssh straight into berg1, from where they will appear as 'service0' and 'service1'.

The 70 compute nodes are dual-CPU (Intel Xeon E5650 ( 2.67 GHz )), six-core blade-like machines, with 48 Gigs of RAM to share between the 12 CPU cores.
This gives us 70 x 12 = 840 CPU cores to run on.
Each node is connected in an all-to-all dual-rail QDR Infiniband network for super-fast communication between the nodes.

The home directories are stored on a Panasas storage machine, which has a total of 100 TB available. This is where you home directory is. There are also the storage areas (/yellowwhale /bluewhale /greenwhale /redwhale and /blackwhale which have additional space totalling 180 TB). Backup space is very limited though.
The contents of the "backup_this" directory in your home directory will be backed up every 2 weeks for important bits of source code/parameters/small things you want to keep extra safe.

COMPILING

We've got two sets of compilers - the gnu compilers (gcc,gfortran,etc).
There are also the Intel compilers, version 11.1 which you can use with "icc" and "ifort".
If you're compiling an MPI program, the following commands will call the compiler, linking in the necessary MPI libraries:
mpicc, mpicxx, mpif90 (which all use the Intel compiler).
Feel free to email me (jop@astro.ox.ac.uk) if you have any problems compiling.

LIBRARIES

There are lots of libraries installed, usually in /panfs/panasas/home/sw so have a look in there to see if it's got what you want and if you need anything else, let me know.

THE JOB QUEUE

We use the torque queueing system, with some extra queueing/display software I've written which runs on top of it.

SUBMITTING YOUR JOB TO THE QUEUE

mpisub "howLongItWillRunFor" numNodes x numCores ./myProgramName
eg.
mpisub "1 week" 2 x 8 ./doSomeAnalysis
will submit your program "doSomeAnalysis" to the default queue, requesting 2 compute nodes with 8 cores on each (so 16 processes are started). The "howLongItWillRunFor" bit is to let other users know roughly how long your job is expected to run for so they can see when theirs might start. You can also use it to add your own comments about the job - parameter numbers, maybe, to keep track of which job is which.
OR
mpisub "1 week" 1 ./doSomeAnalysis
would run a single-process (non-MPI) job
OR
mpisub "1 week" 31 ./doSomeAnalysis
would submit a 31-core job to the queue with no mention of how many cores to run on per compute nodes - so they'll be allocated however the queue sees fit. It only really matters how the cores are allocated if your job uses a lot of MPI communication, in which case you'll want as many of them as possible running on each compute node because if the MPI traffic has to go over the ethernet it will be quite slow.
Once you've submitted it you'll see your job listed in the queue, and it'll have been given a job number. This is the same output you see when you type "st", which we'll go into now.

CHECKING YOUR JOB

st or status
will list your jobs in the queue. Use "st -a" if you want to see everybody else's jobs as well. You'll get a list of how many nodes they've requested and how long they've been queueing/running for. A red "Q" means the job is queueing and a green "R" means the job is running.
You can check the output (stdout,stderr, or "what would have been printed to the screen") of your job by typing:
showoutput jobNumber
This output is kept on one of the compute nodes until the job ends.
You can stop your job with qdel jobnumber at any time if you want.
Feel free to login to the compute node running your job and see how it's doing. Have a look at which compute nodes it's running on ("qstat -f jobNumber") and then (to see node 12):

ssh r1i0n4
top
exit

JOB FINISHING

When your job finishes, the output from it is copied back to the directory you submitted the job from, with the name programName.sh.o????? and programName.sh.e????? where ????? is the job number.

That's the theory anyway. People have problems compiling, running, and things break sometimes, so please feel free to email me (jop@astro.ox.ac.uk) if you have any questions.