Questions on SPS monitoring

How are file types guessed?

The file types are guessed using the last part of the filename, also known as the extension. This is the part of the filename after the last dot (.), if any.

For some file types, the process tries to make educated guesses, for instance a file named Makefile is likely to be a configuration file for the make(1) program.

Some filename extensions are stripped from the filenames because they do not provide useful type information. These extensions are: .BAK, .bak, .back, .backup, .bck, .OLD, .old, .orig, .bz2, .gz, .Z, '(?:.\d+)+$' (i.e. files with a name ending with . followed by one or more digits).

Multiple consecutive such extensions are also stripped, for instance: toto.txt.bak.Z will become toto.txt.

The current file categories are the following (the file extensions are detailled after the category name, when present italic parts in the descriptions are Perl regexp – aka PCRE – used to match filenames):

  • Tarballs (compressed or not), zip files: .tar, .tgz, .tbz2, .zip;

  • Editor backup files, etc.: .bak, ‘/.+~$’, ‘/#[^/]+#$’;

  • DNA analysis files: .blast, .fasta, .phr, .phy, .pin, .psq;

  • C/C++ source files, ROOT C++ source code: .c, .C, .cc, .CC, .cpp, .cxx, .h, .H, .hh, .HH, .hpp, .hxx;

  • CMT control files: .cmt, .cmtref, ‘^requirements$’;

  • Configuration files: .cf, .cfg, .conf, .config, .ini;

  • Core files: ‘^core(?:.d+)?$’;

  • CVS control files: ‘/CVS/(?:Entries(?:.Log)?|Root|Repository|Tag|Template)$’;

  • Undetermined data files: .dat, .data, .DAT, .DATA;

  • (B)DB files: .db;

  • Hidden files (a.k.a. dotfiles): ‘^..+$’;

  • DOSisms: .dll, .exe;

  • FITS astronomical images: .fits, .fz, .head, .list;

  • FORTRAN source files: .f, .F, .f77, .f90, .inc;

  • HBook files: .hbk, .hbook;

  • HTML/Web files: .css, .htm, .html, .js, .json;

  • General purpose images: .GIF, .JPG, .PNG, .gif, .ico, .jpeg, .jpg, .png, .svg, .tga, .tif, .tiff, .xpm;

  • Gravitational Waves Files (LIGO/Virgo): .gwf;

  • Java source files, bytecode and libraries: .class, .java, .jar, .jsp, .war;

  • Libtool output and control files: .la, .la, .Po, .Plo, .Pla, ‘/.(?:lib|dep)s/^/]+$’;

  • Log files: .LOG, .err, .log, .nok, .ok, .out, .output, .stderr, .stdout;

  • Makefiles (make, GNU make, CMake, Imake, etc.): .mk, .make, ‘^(?:(?:(?:GNU|I)?Mm]akefile(?:.(?:in|am)))|CMakeLists.txt)?$’;

  • Object (compiled code) files: .o;

  • Portable Document Format files: .PDF, .pdf;

  • Perl source files including modules: .perl, .pl, .pm;

  • PostScript files (including EPS): .EPS, .PS, .eps, .ps;

  • Python source files, compiled bytecode, pickle & Numpy files: .py, .pyc, .pyo, .npy, .pkl, .pickle;

  • ROOT framework files: .d, .root;

  • Misc scientific software (PAW, IDL, VTK, …) input or data files (HDF5, …): .h5, .idl, .kumac, .mac, .pro, .vtk, .pdb, .pdbqt, .mol;

  • Shell scripts: .awk, .bat, .batch, .bash, .cmd, .csh, .ksh, .sh, .s, .scr, .script, .tcsh, .zsh;

  • Shared libraries (including versionned ones): .so;

  • Static libraries: .a;

  • SVN control files: ‘/.svn/(?:[^/]+|(?:prop(?:s|-base)|text-base)/.+)$’;

  • TCL source files: .tcl;

  • TeX source and ouput files: .aux, .dvi, .fig, .lyx, .sty, .tex;

  • Undetermined text files: .csv, .txt;

  • Unknown or absent suffix/extension

  • XML et al (XSL, WSDL, …): .dtd, .wsdl, .xml, .xsl, .xslt

These statistics must be wrong, they do not account files created today!

These statistics are based on data taken once per day, they can only report files (and directories/links) that existed when the data acquisition process ran.

These statistics must be wrong, the total size is larger than the size of the filesystem (space)!

These statistics are based on the file sizes reported by the system, just like what is displayed by the ls command.

Some files (known as sparse files) are actually smaller than what they pretend to be and thus use less disk space than their apparent size.

The number of such files and the amount of space they purportedly use (if any) are displayed on the Number of files & files sizes distribution and Files sizes & files sizes distribution images with the violet line.

For some filesystems (spaces), there is also a transparent compression on the server. When available, this is mentionned in the Files sizes & files sizes distribution image, in the Specific configuration line.

For these filesystems, the space saved through compression is displayed with the space in holes line (like sparse files).

How often/when are the statistics updated?

The statistics are updated once per day, based on data taken between 00:00 and 01:00 in the morning (Lyon time).

New statistics are available between 05:00 and 07:00 (local time, which is UTC+1 with DST and UTC+2 without DST).

So, the statistics can be considered to be a snapshot of the state of a semi-permanent storage space as it was at the end of the previous day.

Can I see the per user statistics of N days ago?

Yes you can (most of the times).

Let’s say you’re a member of group groupname, and want to know which one of your fellow group member is filling most of /sps/groupname with his or her files.

Statistics about /sps/groupname usage for a given date are available at https://ccspsmon.in2p3.fr/users/groupname-YYYY-MM-DD.html, where DD is the day of the month (on two digits), MM is the month (on two digits) and YYYY the year (on four digits).

For example, statistics for Aug 15 2016 would be available at this URL: https://ccspsmon.in2p3.fr/users/groupname-2016-05-15.html.

The same goes for cleanups, substitute cleanup for users in the URLs, and make sure there was a cleanup on the date you’re interested in.

Can I compare the per user statistics of today and yesterday (or N days before)?

See the previous item.

Open the statistics for the days you’re interested in and compare them.

How does automated cleanup work?

The computing coordinator for your group or experiment defines the cleanup policy with CC-IN2P3 staff.

The most frequent policy is:

  1. If the filesystem is less than 95% full, do nothing. Otherwise, start cleanup.

  2. Get the list of files not accessed for at least 3 months.

  3. Remove as many files as necessary among the files selected in the previous step to lower the filesystem usage to 80%.

  4. Send a success or failure notification e-mail to the experiment support team and/or computing coordinator.

  5. Generate statistics about the cleanup (including files lists).

Other policies are possible and used, for instance file can be unconditionnally removed when they have not been accessed for N days.

The list of files not accessed for N days is extracted from the data file used for the statistics immediatly after the data acquisition. The cleanup process then starts immediatly.

There are several security and safety mechanisms enforced before removing a file (independently of the policy):

  • If a file is accessed between the time of the statistics data gathering and the actual removal time, it will not be removed.

  • Symlinks are not followed.

  • Only regular files are removed. Directories, symlinks and other objects are left untouched.

  • The access permissions defined for the file to be removed or the directory containing the file are not relevant for the cleanup process.

  • There is no way to prevent a file from being removed automatically when automated cleanup is enabled and the file has not been accessed for enough days (there’s no SRM or dCache like pin file feature).

I want my group’s SPS space(s) to be cleaned up automatically, who should I contact, who is in charge of starting the cleanups?

Contact your group or experiment computing coordinator (czar).

If there is someone doing specific support for your group at CC-IN2P3, contact this person directly instead.

Nobody starts the cleanup process, it runs automatically with special privileges on some of the servers hosting the space to clean up.

The statistics are based on the data gathered before the cleanup, therefore they will be wrong for the day of the cleanup.

Some of my files disapperead, what happened?

Are you sure automated cleanup is not enabled for your space?

If cleanup is not enabled, someone may have removed your files by mistake and been able to do so because of lax permissions. In order to remove a file, write access to the directory containing the file is needed.

If the permissions are tight and your files were accessed recently, something bad may be happening. Please contact our user support.

I see discrepancies between some of the graphics for the space(s) of my group, is this a bug?

If you are specifically referring to differences in average file sizes between the Number of files & size distribution versus Files sizes & size distribution, this is not a bug.

This is due to the fact that the base used for scale in these images is different (1024 – one kilobyte – for Files sizes and 1000 for Number of files). This is also amplified by RRDtool averaging.

RRDtool is the tool used to manage the statistics data and create the graphics.