Managing Data Files

Introduction/Scope

This page documents how to tidy up files and directories in data areas. The documentation contains specifics for LHCb and ATLAS data areas, but some of the info is useful for other data areas and home directories.

NOTE: Data areas are not backed up!!

Data areas are very large, so it's not feasible to back these areas up. Home directories are quotaed and relatively small. Home directories are backed up nightly. It is necessary to copy important results to your home directory for safe keeping. Raw data should not be kept in home directories.

ATLAS/LHCb Specifics

The ATLAS and LHCB data areas are made up of a number of individual NFS exports, named /data/atlas00 to /data/atlas04 and /data/lhcb00 to /data/lhcb03. To simplify day-to-day use, top level NFS exports /data/atlas and /data/lhcb are created which just contains symbolic-links to the actual user data areas on the individual NFS shares. This is shown in the figure below:

atlas-nfs-500-48932.png

So for example:

/data/atlas/users/auser --> /data/atlas01/users/auser

/data/atlas/users/auser is a symbolic-links to the actual user area /data/atlas01/users/auser. In normal use, the symbolic-link should be used to access files and it is not important to know the actual data area. The use of a symbolic-link allows the user area to moved to, for instance, a newer server, while the path to access the files remains unchanged.

Note, on the atlas shares there are also symbolic-links from the old atlas data areas: /data/atlas/atlasdata, /data/atlas/atlasdata2 and /data/atlas/atlasdata3. These links allow old scripts pointing to these areas to continue to function. However, the use of these links is deprecated, and they shall be removed at some point in the future.

It is only important to know how these links function for the purpose of checking how much space is left on the drive serving your data area. Checking the space left available on /data/atlas, will not show you how much space is available for your area. You must check the drive your area is on, like so:

> df -h /data/atlas/users/auser
Filesystem Size Used Avail Use% Mounted on
pplxfsatlas01.physics.ox.ac.uk:/data/atlas01 164T 150T 15T 92% /data/atlas01

Checking /data/atlas will only show how much space is left for symbolic links!!

Tools to Manage your Data Area

How much space is left?

You can see how much space is used on the disk you reside by on using the df command. Make sure you use the full path to your data area, like so:

> df -h /data/atlas/users/auser
Filesystem Size Used Avail Use% Mounted on
pplxfsatlas01.physics.ox.ac.uk:/data/atlas01 164T 150T 15T 92% /data/atlas01

How much space am I using?

There are multiple ways to check how much space you are using. The most straightforward is using the quota command on pplxint10 or 11. Quotas are enforced on the home-directory server, and enabled but not enforced on the atlas and lhcb servers.

> quota -s
Disk quotas for user auser (uid 25854):
Filesystem space quota limit grace files quota limit grace
pplxfshome01.physics.ox.ac.uk:/home
21703M 0K 29297M 64565 0 0
pplxfsatlas01.physics.ox.ac.uk:/data/atlas01
31789K 100T 0K 2 0 0
pplxfslhcb01.physics.ox.ac.uk:/data/lhcb01
40567K 100T 0K 5 0 0

The -s option gives human friendly units. The first column shows the space used by you on the various filesystems. The fourth column shows the number of files you have on the various filesystems.

This is does not necessarily equate to how much space is taken up in your data area. Someone may own files in your area and you may own files in someone else's.

To find how much space is taken up by your user area use the du (diskusage) command like so:

First find the disk your area is being shared from:
> ls -alh /data/atlas/users/auser
lrwxrwxrwx 1 root root 26 Oct 14 14:55 /data/atlas/users/auser -> /data/atlas01/users/auser

This user is being served from /data/atlas01 so:
> du -sh /data/atlas01/users/auser
72G /data/atlas01/users/auser

Note, if you have a lot of files (some users have many millions) the command may take a looong time to complete.

How can I analyse my disk usage?

To find out how much each sub-directory is using, cd into the directory and use the du command like so:
> cd /data/atlas/users/auser
> du -sh *
72G importantdata
34T myrubbish

It can be time consuming to go through and run du for each of your sub-directories. Running the command ncdu on your top level directory will allow you to browse through your directory structure to better see where the space is being taken up:

> ncdu -r /data/atlas/users/auser
ncdu 1.17 ~ Use the arrow keys to navigate, press ? for help [read-only]
--- /data/atlas01/users/auser -------------------------------------------------
34.5 TiB [###########] /myrubbish
72.1 GiB [# ] /importantdata

The -r option provide a read-only interface, so as not to delete things by accident. It can take a very long time to scan through your files, so it can be annoying to accidentally exit the app and have to scan again. Better to dump the directory database to a file and then work on that. You can come back to it again at a later date without having to run ncdu:

> ncdu -ro ncdu_data_file /data/atlas/users/auser
> ncdu -rf ncdu_data_file
ncdu 1.17 ~ Use the arrow keys to navigate, press ? for help [read-only]
--- /data/atlas01/users/auser -------------------------------------------------
34.5 TiB [###########] /myrubbish
72.1 GiB [# ] /importantdata

FileSize
atlas-nfs-500-48932.png87.36 KB

Categories: Linux | PP | Particle | Unix | ppunix