Metadata and Databases

Metadata (data about data) plays a critical role in ATLAS data processing and analysis. Database systems are used to store this critical data and make it available to a vast array of systems which rely on it.

ATLAS is a large multi-purpose experiment at the Large Hadron Collider at CERN exploring the physics of head-on collisions of protons (and sometimes heavy ions) at the high energies available. Its global scale data repositories include tens of petabytes of real and simulated event data and their processing through the collaboration's standard production chains. Every process of reconstructing and analyzing this data requires "metadata": the key quantities of the data which describe the underlying data and relate it to the big picture.

Efficient and effective metadata collection and storage are key to the success of this endeavor (and indeed any process within it). Metadata is used to not only give a descriptive overview of existing samples, but is used in everything from driving large scale processing to helping physicists find rare events. Metadata is stored centrally in database systems, which has been very effective at delivering this important information to all processes that need it. In addition, these systems are the back end to the interfaces needed by physicists to find the data they need.

Dr. Gallas has taken an active role in the development of many applications which use the data in databases since 2006. In 2008, she became part of the ATLAS Database Coordination team to facilitate coherent database application development across the many systems in ATLAS using databases. She was appointed ATLAS Metadata Architect in 2011, working with subsystem, computing, and database experts to ensure optimal storage and access to this critical information.

Four Oxford Ph.D. candidates have contributed significantly to database implementations in ATLAS as part of their contribution to the ATLAS infrastructure. One of the metadata applications called COMA (a repository of the Conditions and Configuration Metadata for ATLAS) has been largely implemented by members of the Oxford ATLAS group.

  • Dr. Ryan Buckingham (Ph.D in 2013) developed a unique browser, making COMA data available as a Run based selection service which not only facilitates data selection by conditions attributes, but also gives the user information at each stage about the relationship between the conditions chosen and the remaining conditions criteria available. This work was presented at an international conference (CHEP) in Taiwan in 2010 (Metadata Aided Run Selection at ATLAS).

  • Dr. Katherine Pachal (Ph.D. in 2015) expanded the scope of the information about ATLAS Runs in COMA into 4 areas: LHC Beam configuration, Luminosity measurement, Magnet States, and Aggregate event counts by Run and Stream. This work has greatly enhanced the overall utility of the information in COMA and was included in a presentation at the 2012 CHEP conference in New York (Conditions and configuration metadata for the ATLAS experiment).

  • Dr. Lucy Kogan (Ph.D. in 2016) implemented the ATLAS FSI optical alignment system into the ATLAS Geometry Database, making its physical characteristics and layout available in ATLAS Software. This is useful, for example, in ATLAS Detector simulations and to visualize this FSI system in 3 dimensional views on its own or within the overall ATLAS Event Displays.

  • Dr. Lydia Beresford (Ph.D. in 2017) contributed to the COMA project by adding metadata about collections of "Good Runs Lists", which are sets of events which satisfy various data quality criteria. In addition, she contributed to the collection of metadata content in the area of event counts at various stages of processing and analysis, including those in GRLs. This work further expands the overall utility of the information in COMA.