This lesson is in the early stages of development (Alpha version)

Rucio Usage

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I use Rucio?

Objectives
  • Become familiar with aspects of Rucio

  • Use Rucio tags to find specific types of files

Getting Started

We can access and run the Rucio client from within eic-shell. From wherever you have eic-shell:

./eic-shell
rucio whoami

This should print out some information -

email      : eicprod@jlab.org
account    : eicread
account_type : GROUP
...

We can also check the arguments we can supply to rucio, as well as usage info with:

rucio -h

To use Rucio further, we will need to briefly look at how Rucio organises data.

Datasets and DIDs

Typically, we want to analyse data contained within specific files. Files can be grouped together into datasets which can themselves, be grouped into containers. All three refer to “data”. As such, the term “data identifier` or DID is used to represent any set of files, datasets or containers in Rucio. a DID is just the name of a single file, dataset or container.

In Rucio, all DIDs follow a naming scheme which is composed of two strings - a scope and a name, formatted as -

scope:name

For epic, the scope is always epic, meaning that all of our DIDs look like:

epic:name

The name contains information about the dataset in question and contains information such as the software release used to create the file, electron and ion beam energies etc.

As an example, consider the DID for the dataset:

The name here - /RECO/26.02.0/epic_craterlake/EXCLUSIVE/DEMP/DEMPgen-1.2.4/10x130/q2_10_20/pi+, tells us many things about the contents of this dataset. Let’s break this down, examining the component enclosed within each pair of /---/ -

Warning - Not a filepath!

The name of our DID here looks a lot like a filepath, however it is a flat object and does not have any hierarchy as we will see in the next section.

Other names may not necessarily contain all of the same information, but as a bare minimum, are likely to tell us something about the physics process simulated and beam conditions, as well as which software release was used. This is reflected in the metadata tags assigned as we will see later.

Finding DIDs

Now that we know what a DID looks like, how can we find the DID corresponding to the file or dataset that we’re interested in?

… … …

However, a much easier approach to finding what we need is to use the metadata tags that are assigned all DIDs from March 2026 onwards.

Metadata Tags

The following tags are available as of March 2026:

As noted on some items in this list, some tags are optional and may not be applied to all datasets. However, the following tags are required for all datasets:

We can use these tags to filter through the available datasets and identify those of interest to us. For example:

Example command

Exercise:

Using tags, find the DIDs of the latest:

  • DEMP events in the Q2 range of 3 to 10 for 10 GeV electrons on 250 GeV protons
  • Print the full DID and check the number of files in the dataset Hint - Check the example name we looked at when introducing DIDs in a previous section.

Using DIDs

Info on checking DID info and downloading

Key Points

  • Rucio works with datasets and Data Identifiers (DIDs)

  • ePIC DIDs may look or be formatted like a nested filepath, but they are flat

  • Tags can be used to quickly sort and find data of interest