EIC Tutorial: Analysis

Introduction

Overview

Teaching: 15 min
Exercises: 5 min
Questions
  • How do I locate and access the simulation output?

Objectives
  • Understand how the simulation output is organized

  • Know how to access the simulation output using Jefferson Lab xrootd

More detailed information on the simulation productions, including the information presented below, can be found on the Simulation Production Campaign Website.

Note that as of March 2026, Rucio will soon become the default and preferred method to browse and find files and datasets. A tutorial on using Rucio for this purpose will be available soon

Simulation Files Organization

There are three broad classes of files stored on xrootd, each in their own directory:

Most users will interact with the files in the RECO directory and that is what we will focus on in this tutorial. Within the RECO directory, files are organized by campaign (26.02.0 for the February 2026 campaign, for example), detector configuration and then physics process. Each physics process will have different sub directories, for example generator version, energy, or Q2. The directory structure and number of reconstructed files for each campaign can be found on the Simulation Website here.

Note that campaigns more than ~6 months old will not directly be accessible. If you are running this tutorial and encounter a file access error, check the campaign you are trying to access. Where possible, use the latest campaign available.

Access Simulation from Jefferson Lab xrootd

The prefered method for browsing the simulation output is to use xrootd from within the eic-shell. To browse the directory structure and exit, one can run the commands:

./eic-shell
xrdfs root://dtn-eic.jlab.org
ls /volatile/eic/EPIC/RECO/26.02.0
exit

It is also possible to copy a file and open it locally using the xrdcp command:

./eic-shell
xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/26.02.0/path-to-file .
exit

Files can also be copied locally by replacing ls with cp.

Streaming Files

It is also possible to open a file directly in ROOT if you have xrootd installed too. Note that the following command should be executed after opening root and TFile::Open() should be used:


auto f = TFile::Open("root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/path-to-file")

Reminder - Download a file for the next step!

We will need a file to analyse going forward, if you have not done so, download a file now!

Grab a file from -

/volatile/eic/EPIC/RECO/26.02.0/epic_craterlake/DIS/NC/18x275/minQ2=10/

For example -

xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/26.02.0/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_5.0001.eicrecon.edm4eic.root ./

Note that the ./ at the end is the target location to copy to. Change this as desired.

Note that we can also specify a different filename to copy to as we could with a normal cp command. You might want to do this as the filename is a little cumbersome. I called mine NC_DIS_18x275_Feb26Campaign.root, just replace ./ with your file name of choice.

You can also stream the file if you prefer, just copy the path of the file above. You will need to modify the scripts later in the tutorial accordingly to account for this.

Typically, if you are processing more than a handful of files, it is probably best to stream files from the server rather than downloading a local copy of all files.

Key Points

  • Use xrdfs from within the eic-shell to browse available files from simulations campaigns.

  • Use xrdcp from within eic-shell to copy files to your local environment.

  • Within eic-shell, you can also stream files directly in your root macros.


The Reconstruction Output Tree

Overview

Teaching: 20 min
Exercises: 20 min
Questions
  • What information is stored in the reconstruction output and how can we access it?

Objectives
  • Become familiar with the tree branches

What we generally call the simulation output are root trees generated by the reconstruction software, EICrecon. The branches which appear in the output trees and their content are determined by which EICrecon factories and algorithms are run.

EICrecon Output Tree Structure

The output tree contains various branches appropriate for the individual detector subsystems. For example, the calorimeter subsystems will have branches for reconstructed hits and clusters. In addition to individual subsystem information, there are also branches for reconstructed quantities (e.g. reconstructed particles, inclusive kinematics, and jets) which may combine information from several subsystems. There are also branches encoding relationships between different reconstructed quantities as well as reconstructed and truth quantities.

Exercise

  • Stream a simulation output tree from within root (see previous lesson) and browse the structure by calling new TBrowser()
  • Take some time to explore the branches of the tree. What information is included for various subsystems? What are some of the reconstructed quantities?
  • Try plotting some basic quantities. Plot the cluster energy of the negative endcap ECal - do you see the peak from the scattered electron?

For the last part, some extra tips are included below.

Browsing the file - ROOT Tips

We can navigate around the TBrowser and get it to draw quantities by simply double clicking them. However, sometimes the auto binning can be off or we might want to look at a specific range. We can do this to some extent directly from the TBrowser.

Alternatively, we can also plot variables to histograms of our own choosing on the fly. We can do this via -

root $FILE
events->Draw("QUANTITY")

where $FILE is the file we want to open and “QUANTITY” is the thing we want to draw. For example -

events->Draw("MCParticles.momentum.z")

will draw the MCParticles.momentum.z branch. So far, so much like just double clicking the TBrowser. However, we could define a new histogram and fill this variable to it -

events->Draw("MCParticles.momentum.z>>h1(100,0,100)")

where h1 is our new histogram. This has 100 bins from 0 to 100. We can also apply selection criteria (i.e. cuts) on the fly. These do not need to be on the same variable we are drawing! For example -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0")

will fill our histogram only with negatively charged particles. We can add more conditions if we want -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0 && MCParticles.mass>1")

where we now also require that the particle mass is >1.

We can also add drawing options -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0", "HISTERR")

such as adding error bars. We can also make 2D histograms in the same way -

events->Draw("MCParticles.momentum.z:MCParticles.charge>>h2(40,-2,2, 360,-60,300)", "", "COLZ")

note the order of defining the binning. It’s not what you might expect. Also, to interpret the result (as is often the case with 2D histograms), you might want to set a log Z scale -

gPad->SetLogz()

Which branches do I need?

As you have probably seen by now, the tree in our file contains a lot of information!

For the rest of the tutorial, we will be focusing on a relatively small subset. This is generally true with most analyses, it’s rare we’ll be retaining and looking at every branch. A brief (and incomplete!) dictionary of commonly used branches, including those we’ll need in this tutorial, is included in the “Extras” section. See the tab at the top of the page or follow this link.

To find out more about a particular collection, you could also take a look at the edm4eic datamodel. For some definitions and explanations, you may need to refer to the edm4hep datamodel.

The MCParticles Record

In the last section, the example quantities we were plotting involved the MCparticles branch. Nearly every analysis will include some comparison to the truth level, so it is important to understand how to access generator level information. Particles produced by the Monte Carlo Event Generator and during the interaction of these primary particles with the detector material as modelled by GEANT are stored in the MCParticles branch, this structure is defined by the datatype edm4hep::MCParticle. The particle’s PDG code, charge, production time, mass, production vertex, endpoint, momentum at the production vertex, and momentum at the endpoint are all available. In addition, the status of the particle as defined by the event generator and the detector simulation are stored. For example, if one wanted to look at stable particles from the event generator, they would require MCParticles.generatorStatus == 1. The field MCParticles.simulatorStatus is a bit-field which encodes some information on how the particle propagated through the detector. The detailed definition of the bit assignments can be found in the edm4hep yaml file.

How is the tree populated?

You may wonder how the specific branches of the tree have actually been populated. As this is the output after EICrecon, the trees in the file are those specified as EICrecon was run by the algorithms and factories that were enabled at run time.

This is covered in more detail in the Understanding the Simulation Output tutorial. In particular, Episode 3 of that tutorial details how you can identify the sequence of algorithms utilised to generate a specific output branch. In the case of the example in this tutorial, the track reconstruction is detailed.

Work through the algorithms and factories tutorial. Afterwards, take a closer look at this file. See if you can figure out which algorithm or factory was responsible for creating some of the included branches in this file.

Key Points

  • Output trees contain a lot of information. Take time to explore what is available, identify what you want to try and do, find the relevant branches.

  • The MCParticles branch holds information on generator level particles, critical for use in comparing to what we actually detect!


Analyzing the Reconstruction Output

Overview

Teaching: 20 min
Exercises: 40 min
Questions
  • How does one utilize the reconstruction output trees to do an analysis?

Objectives
  • Become familiar with methods for reading the trees

  • Understand how to access truth/particle information

  • Find track efficiency and resolution

So far, we have only looked at (and plotted) some information from our file interactively. This is very useful and can help us identify the variables we want to deal with. However, we can’t really use these techniques to conduct a full analysis of the data. To do so, we typically use a script or macro. In this part of the tutorial, we will create a script that we can use to do a relatively straightforward analysis of our file.

Note:

  • The branch dictionary outlines all of the branches we will need to utilise in this section.
  • If you want, you can prune the branches you don’t need from the input file using the TreePrune.C script.

Reading the Output Trees

The simulation output trees are “flat” in the sense that there is no event class structure embedded within the tree and no additional libraries are needed to handle the output. Therefore, the end user can simply read the values stored in each branch using whatever method/workflow they are most comfortable with. Examples of several common methods for reading the trees are provided below. We will see a ROOT TTreeReader based example using a ROOT macro and a python/uproot based version. There is also an example using the (relatively) new RDataFrame class of ROOT. During the tutorial, you should try the exercise using whichever language you feel most comfortable with.

Sample Analysis with ROOT TTreeReader: Track Efficiency and Resolution

As a sample exercise to become familiar with the simulation output and how to use it in a realistic setting, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.

Before we begin, we should create a skeleton macro to handle file I/O. For the TTreeReader example, we will use a simple ROOT macro. Using your favorite text editor, create a file with a name like trackAnalysis.C or something similar and copy in the following code:

void trackAnalysis(TString infile="path_to_your_simu_file")
  {
    // Set output file for the histograms
    TFile *ofile = TFile::Open("out.hist.root","RECREATE");

    // Analysis code will go here

    ofile->Write(); // Write histograms to file
    ofile->Close(); // Close output file
  }

We will need momentum, generator status, and particle species information for the truth particles and momentum information for the reconstructed tracks. The reconstructed track information can be accessed from two different branches: CentralCKFTrackParameters and ReconstructedChargedParticles. We can access these branches using a TTreeReaderArray.

ROOT TTreeReaderArrays:

TTreeReader and the associated TTreeReaderArray is a simple interface for reading data from a TTree. The class description and examples can be seen here. To instantiate the reader and access values from a given branch (e.g. the MCParticles branch), one would use the following calls:

// Set up input file chain
TChain *mychain = new TChain("events");
mychain->Add(infile);

// Initialize reader
TTreeReader tree_reader(mychain);

// Access whatever data-members you need
TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
TTreeReaderArray<float> partMomX(tree_reader, "MCParticles.momentum.x");
...

The branches and their members can be viewed by opening a file with TBrowser (new TBrowser()) from within ROOT. Once you have defined the TTreeReaderArray objects for the data-members you want to look at, you can loop over the events and the members within that event:

while(tree_reader.Next()) { // Loop over events
 for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop through particles in the event
   {
     int particleStatus = partGenStat[i]; // Access data-members as you would an array
     float particleXMomentum = partMomX[i]; // partMomX should have same number of entries as partGenStat because they are in the same branch
     ...
   }
}

All members of the same branch should have the same number of entries, so it is sufficient to use any member of the branch to set the limit of your loop.

We will proceed using the ReconstructedChargedParticles branch as this will give us a chance to practice using associations, copy the following lines into your analysis macro.

// Set up input file chain
TChain *mychain = new TChain("events");
mychain->Add(infile);

// Initialize reader
TTreeReader tree_reader(mychain);

// Get Particle Information
TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
TTreeReaderArray<double> partMomX(tree_reader, "MCParticles.momentum.x");
TTreeReaderArray<double> partMomY(tree_reader, "MCParticles.momentum.y");
TTreeReaderArray<double> partMomZ(tree_reader, "MCParticles.momentum.z");
TTreeReaderArray<int> partPdg(tree_reader, "MCParticles.PDG");

// Get Reconstructed Track Information
TTreeReaderArray<float> trackMomX(tree_reader, "ReconstructedChargedParticles.momentum.x");
TTreeReaderArray<float> trackMomY(tree_reader, "ReconstructedChargedParticles.momentum.y");
TTreeReaderArray<float> trackMomZ(tree_reader, "ReconstructedChargedParticles.momentum.z");

// Get Associations Between MCParticles and ReconstructedChargedParticles
TTreeReaderArray<int> recoAssoc(tree_reader, "_ReconstructedChargedParticleAssociations_rec.index");
TTreeReaderArray<int> simuAssoc(tree_reader, "_ReconstructedChargedParticleAssociations_sim.index");

The last two lines encode the association between a ReconstructedChargedParticle and an MCParticle where the matching is determined in the ParticlesWithPID algorithm which generates the ReconstructedChargedParticle objects.

Compiling ROOT Macros:

  • If you are analysing a large number of events, you may wish to compile your macro to increase throughput. An example of how you can create and compile a root macro is included in the Exercise Scripts section

Efficiency Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Now that we have access to the data we need we will begin constructing our efficiency plots, starting with efficiency as a function of the true particle pseudorapidity. The basic strategy is outlined below:

  1. Loop over all events in the file
  2. Within each event, loop over all stable charged particles
  3. Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
  4. Create and fill the necessary histograms

Here is the code to implement these steps:

// Define Histograms
TH1D *partEta = new TH1D("partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.);
TH1D *matchedPartEta = new TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.);

TH1D *matchedPartTrackDeltaR = new TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.);

while(tree_reader.Next()) { // Loop over events

  for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop over thrown particles
    {
	    if(partGenStat[i] == 1) // Select stable thrown particles
	      {
	        int pdg = TMath::Abs(partPdg[i]);

	        if(pdg == 11 || pdg == 13 || pdg == 211 || pdg == 321 || pdg == 2212) // Look at charged particles (electrons, muons, pions, kaons, protons)
	          {
		          TVector3 trueMom(partMomX[i],partMomY[i],partMomZ[i]);

		          float trueEta = trueMom.PseudoRapidity();
		          float truePhi = trueMom.Phi();
	    
		          partEta->Fill(trueEta);

              // Loop over associations to find matching ReconstructedChargedParticle
		          for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
		            {
		              if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
		                {
			                TVector3 recMom(trackMomX[recoAssoc[j]],trackMomY[recoAssoc[j]],trackMomZ[recoAssoc[j]]); // recoAssoc[j] is the index of the matched ReconstructedChargedParticle

                      // Check the distance between the thrown and reconstructed particle
			                float deltaEta = trueEta - recMom.PseudoRapidity();
			                float deltaPhi = TVector2::Phi_mpi_pi(truePhi - recMom.Phi());
			                float deltaR = TMath::Sqrt(deltaEta*deltaEta + deltaPhi*deltaPhi);

			                matchedPartTrackDeltaR->Fill(deltaR);

			                matchedPartEta->Fill(trueEta); // Plot the thrown eta if a matched ReconstructedChargedParticle was found
                    }
                }
            }            
        }
    }
}

We should now have everything we need to find the track efficiency as a function of pseudorapidity. To run the macro and produce an output file containing the histograms we defined, simply type root -l -q trackAnalysis.C. After the macro runs, you can open the root file to inspect the histograms. The efficiency can be found by taking the ratio of matchedPartEta over partEta.

Question:

  • Do the histogram ranges make sense?
  • We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?
  • When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?
  • Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta
  • Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle

Resolution Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Next, we will look at track momentum resolution, that is, how well the momentum of the reconstructed track matches that of the thrown particle. We should have all of the “infrastructure” we need in place to do the analysis, we just need to define the appropriate quantities and make the histograms. It only makes sense to define the resolution for tracks and particles which are associated with one another, so we will work within the loop over associations. Define the resolution expression and fill a simple histogram:

TH1D *trackMomentumRes = new TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
...
// Loop over associations to find matching ReconstructedChargedParticle
for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
  {
    if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
      {
        ...
        double momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag();

        trackMomentumRes->Fill(momRes);
      }
  }  

While this plot will give us a sense of what the tracking resolution is, we don’t expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.

Question:

  • Will the histogram ranges for each particle species be the same?
  • Could we present the resolution values in a more understandable way?

Sample Analysis with Python/uproot - Pythonic Method: Track Efficiency and Resolution

For some examples of using uproot to access information in .root files, please consult this notebook which can be run in Google Collab.

If you are more familiar with python than you are with C/C++, you might find that using a python based root macro is easier for you. Outlined below are sample blocks of code for creating and running a python based analysis script.

With python, some tasks become easier, e.g. string manipulation and writing to (non ROOT) files.

Before we begin, we should create a skeleton macro to handle file I/O. For this example, we will make a simple python script. Using your favourite editor, create a file with a name like trackAnalysis.py or something similar and copy in the following code:

#! /usr/bin/python
# Import some relevant packages
import uproot as up
import awkward as ak
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.ticker as ticker
import matplotlib.cm as cm
import matplotlib.pylab as plt
import scipy, vector, os
from XRootD import client
from scipy import stats
from matplotlib import pyplot as plt
from matplotlib.gridspec import GridSpec
from matplotlib import colors as colours

# Set some matplot lib features
plt.rcParams['ytick.direction'] = 'in'
plt.rcParams['xtick.direction'] = 'in'
plt.rcParams['xaxis.labellocation'] = 'right'
plt.rcParams['yaxis.labellocation'] = 'top'
plt.rcParams["figure.figsize"] = (16,9)
kP6 = ['#5790fc','#f89c20','#e42536','#964a8b','#9c9ca1','#7a21dd'] # Set ROOT kP6 colours - see https://root.cern.ch/doc/v636/classTColor.html

# Open our file
fname = "INPUT_FILE.root"
if os.path.isfile(fname):
    file=up.open(fname)
else:
    print("Error opening file - ", fname, " check your fname variable!")

# Open the tree
tree = file['events']

# Convert relevant branches to arrays
MCPartBr = tree["MCParticles"].arrays()

# Define some filters

# Use filters to manipulate data

# Create and write some plots

Make sure you change the input file name to match whatever you saved your file as earlier.

Note that we are using the module uproot to access the data here. See further documentation here. You may also need some of the other included packages too.

We will use uproot a little bit like we use the TTreeReader in the other example. We can define the branches we want and assign them to arrays with uproot. We can do this via:

# Open input file and define branches we want to look at with uproot
fname = "INPUT_FILE.root" # Your file name
file=up.open(fname)
tree = file['events']
# Convert relevant branches to arrays
MCPartBr = tree["MCParticles"].arrays()
# If we want, convert a specific branch element to an array
partPdg = tree["MCParticles.PDG"].array()

Uproot effectively takes the information in the tree, and turns it into an array. We can then access and manipulate this array in the same way that we can with any array in python.

Warning: Note that if you are using an older version of uproot (v2.x.x), you will need to access the branches slightly differently via -

partGenStat = tree.array("MCParticles.generatorStatus")

Once you create your script and add the template code in, you can try running it withpython3 trackAnalysis.py or python trackAnalysis.py. At the moment, it shouldn’t do anything, but we can change that!

Try assigning a quantity to an array, such as the MC particles PDG values above and printing some values of that array. Or, perhaps try printing the length of that array. We could also quickly make a plot of the values with -

plt.hist(ak.flatten(partPdg),alpha=0.75, color=kP6[0])
plt.savefig("TestOut.png", dpi = (160))
plt.clf # Clear figure

The script should now write out a figure, TestOut.png when you run it, showing the MC PDG values of entries in the file.

We did not specify a number of bins or a range, so our plot looks a bit odd. What might be a useful range and number of bins to use here? We can specify our number of bins and our range with -

plt.hist(ak.flatten(partPdg),bins=NBins, range=(X,Y),alpha=0.75, color=kP6[0])

With NBins being our number of bins and X and Y being our min/max range - think carefully about these numbers!

Note that we don’t really need to define individual arrays either, we can just directly access the array we want once we’ve converted the branch to a series of arrays -

MCPartBr = tree["MCParticles"].arrays()
plt.hist(ak.flatten(MCPartBr["MCParticles.PDG"]),alpha=0.75, color=kP6[0])
plt.savefig("TestOut.png", dpi = (160))

Note: Remember to call:

plt.clf() # Clear figure

After a figure to avoid drawing on the same plot.

We can also define and apply filters to our arrays as we plot or print from them -

# We will filter on the particle status. Generator status == 1 corresponds to a stable particle (as opposed to a beam or intermediate particle) that we could detect in our detector. 4 is for beam particles
BoolStable=(MCPartBr["MCParticles.generatorStatus"]==1) # This filter is actually an array of booleans. Any where the generator status for a particle == 1 will return true
plt.hist(ak.flatten(MCPartBr["MCParticles.PDG"][BoolStable]),alpha=0.75, color=kP6[0]) # Apply filter to PDG array as we plot it. Only stable particles will now be plotted
plt.savefig("TestOut2.png", dpi = (160))

And we can also combine conditions in our filters -

BoolStablePos=((MCPartBr["MCParticles.generatorStatus"]==1) & (MCPartBr["MCParticles.charge"]>0)) # Create a filter to select out stable, positively chrarged particles 
plt.hist(ak.flatten(MCPartBr["MCParticles.PDG"][BoolStablePos]),alpha=0.75, color=kP6[0]) # Apply filter to PDG array as we plot it. Only stable particles will now be plotted
plt.savefig("TestOut3.png", dpi = (160))

We did not specify a number of bins or a range, so our plot looks a bit odd. What might be a useful range and number of bins to use here?

Efficiency Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Our approach here is a bit different to the TTreeReader example, but we will still need to utilise our simulation and reconstruction association IDs. We can access them via -

RecoAssocRec = tree['_ReconstructedChargedParticleAssociations_rec'].arrays()
RecoAssocSim = tree['_ReconstructedChargedParticleAssociations_sim'].arrays()
RecID=RecoAssocRec['_ReconstructedChargedParticleAssociations_rec.index'] # Array of reconstructed IDs
SimID=RecoAssocSim['_ReconstructedChargedParticleAssociations_sim.index'] # Array of simulated IDs

These are arrays of indices which map our simulated data to events which have actually been reconstructed in our simulation. We can use these to index our arrays and pick out only events where a simulated particle has a matching reconstructed particle (OR vice versa, where a reconstructed particle has a matching simulated particle). Note that when we index by these particles, we also need to make sure any filters are also indexed appropriately. E.g.

BoolChargeTrack = ((abs(MCPartBr["MCParticles.charge"])!=0) & (MCPartBr["MCParticles.generatorStatus"]==1)) # A filter to select out charged stable particles in the MC data

If we apply this filter to an array which has been indexed by the Simulation ID, we will run into problems -

BoolChargeTrack = ((abs(MCPartBr["MCParticles.charge"])!=0) & (MCPartBr["MCParticles.generatorStatus"]==1)) # A filter to select out charged stable particles in the MC data
MatchPDG = MCPartBr["MCParticles.PDG"][SimID] # An array of the MC PDG values for all MC particles with a matching reconstructed particle.
# print(MatchPDG[BoolChargeTrack]) # Will return an error, arrays don't match sizing!

We could either -

  1. Create a new filter which is explicitly indexed by the SimID’s
  2. Index our previous filter by the SimID as we use it

Both should return the same answer -

print(MatchPDG[BoolChargeTrack[SimID]]) # Explicitly index our filter by SimID first
BoolChargeTrackMatch = ((abs(MCPartBr["MCParticles.charge"][SimID])!=0) & (MCPartBr["MCParticles.generatorStatus"][SimID]==1)) # A filter to select out charged stable particles in the MC data
print(MatchPDG[BoolChargeTrackMatch]) # Use a newly defined filter which has been indexed by the SimID

To select out reconstructed particles that have a matching simulated particle, we can index by RecID in a similar way. Note that we need to be careful what we conclude from our analysis if we match particles in this way. We are “cheating” in the sense that we are directly matching the truth to what we reconstruct. We cannot do this in a real experiment.

We may want to group some of our quantities together as vectors for easy manipulation. For example, we could create an array of vectors corresponding to our charged particles -

MC_Parts = vector.zip({'px': MCPartBr["MCParticles.momentum.x"], 'py': MCPartBr["MCParticles.momentum.y"], 'pz': MCPartBr["MCParticles.momentum.z"]})

We can filter and index this like any other array. Creating 3 or 4-vectors in this way is useful as we can then use various functions to extract information from our vectors. For example, we can easily get -

print(MC_Parts.eta)
print(MC_Parts.theta)
print(MC_Parts.pt)

Now, we can use what we know to determine and plot some simple efficiency graphs for our reconstructed data. Efficiency is a measure of the probability that we will detect an incident particle. We could calculate our efficiency by straightforwardly counting how many particles we detect vs how many we “threw”. For example, if we detect 9 particles and 10 were generated, our efficiency is -

9/10

i.e.

90%

This might be a useful figure. However, if we are evaluating the performance of a detector, it might be useful to consider the efficiency as a function of another quantity, for example eta or p. What this will tell us is how likely we are to detect particles incident on certain areas of the detector (or with a certain momentum for instance). We should not really expect these distributions to be completely flat.

In terms of our code, we can straightforwardly determine this by dividing some histograms. We can divide histograms via -

RecPartBr = tree["ReconstructedChargedParticles"].arrays()
BoolElec=((MCPartBr["MCParticles.PDG"][SimID]==11) & (MCPartBr["MCParticles.generatorStatus"][SimID]==1)) # Define a filter to select out electrons which reconstruct in our detector
MCHist = np.histogram(ak.flatten(MCPartBr["MCParticles.momentum.x"][BoolElec]), bins=100, range=(0,25)) # Create a hisogram of x-momenta for MC particles that are electrons
RecHist = np.histogram(ak.flatten(RecPartBr["ReconstructedChargedParticles.momentum.x"][RecID][BoolElec]), bins=100, range=(0,25)) # Create a histogram of reconstructed x-momenta values for particles that are actual MC electrons that have reconstructed
with np.errstate(divide='ignore'):
    Division = RecHist[0] / MCHist[0]
Division = np.nan_to_num(Division,nan=0, posinf = 0) # Convert any nan or pos inf from /0 (empty bins) to 0
Bin_Edges=MCHist[1]
Bars = 0.5 * (Bin_Edges[1:] + Bin_Edges[:-1])
BarWidth=Bars[1]-Bars[0]
plt.bar(Bars, Division, width=BarWidth, alpha=0.75, color=kP6[2])
plt.savefig("TestOut4.png", dpi = (160))

Important - What we have create here is not our efficiency! We have simply divided the reconstructed electron P_{X} by its true value for MC electrons that have reconstructed in our detector. We used the PDG code for electrons, 11, to pick out electrons from our MC particles branch. There are a few caveats to actually calculating our efficiency. This just demonstrates how we can divide histograms in python.

For our efficiency. We need to compare our thrown particles of a given type to our detected particles of a given type.

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?
  • Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta
  • Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle

Hint: Getting the right arrays here is a bit tricky. We want three different things -

  • Our MC particles (truth information), regardless of whether we have a matching track or not. This is just:
    • “MCPartBr[‘MCParticles.QUANTITY’]”[SELECTION_CUTS] - We do not need to index this by the SimID
  • Our MC particles (truth information) that do have a matching reconstructed track, we just need to index these by the SimID:
    • “MCPartBr[‘MCParticles.QUANTITY’][SimID]” - We can then apply selection criteria
  • The Reconstructed particle information for events which correspond to a real MC track, we just need to index these by our RecID:
    • “ReconChPartBr[‘ReconstructedChargedParticles.QUANTITY’][RecID]”

2D Histograms: We can make 2D histograms in python via -

plt.hist2d(np.asarray(ak.flatten(Quantity1)), np.asarray(ak.flatten(Quantity2)), bins=[NBinsX,NBinsY], range=[[XLow,XHigh],[YLow,YHigh]], cmin=1)
cb = plt.colorbar()
cb.set_label('Counts/bin')

We can set titles etc as usual. Simply swap on the bin values and ranges, as well as the quantities as needed. Make sure your arrays contain equal numbers of entries.

Resolution Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Next, we will look at track momentum resolution. The resolution tells us how well we can reconstruct our “true” value. For example. we might want to know how well we can determine the energy of our particles. As such, we could calculate the energy resolution. Our resolution is simply -

This is often expressed as a percentage. So, for example, say we detect a particle and determine its energy to be 0.95 GeV. In reality, the energy was 1 GeV. As such, our energy resolution for this particle is -

Let’s see a quick example of this calculation for a a quantity -

ElecMomXRes = ((RecPartBr["ReconstructedChargedParticles.momentum.x"][RecID][BoolElec]-MCPartBr["MCParticles.momentum.x"][SimID][BoolElec])/MCPartBr["MCParticles.momentum.x"][SimID][BoolElec])*100 # Multiply by 100 to get this as a %
plt.hist(ak.flatten(ElecMomXRes), bins=50, range=(-25,25),alpha=0.5, color=kP6[2])
plt.savefig("TestOut5.png", dpi = (160))

Here we’ve calculated and plotted the X momentum resolution for our charged tracks that correspond to true electrons in our sample. Whilst this plot will give us a sense of what the tracking resolution is, we don’t expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.

Question:

  • Will the histogram ranges for each particle species be the same?
  • Could we present the resolution values in a more understandable way?

Sample Analysis with Python/uproot - ROOT/Pyroot Style: Track Efficiency and Resolution

Comment: Despite using python/uproot, I have written these in a very “ROOT”/C way. Uproot converts our branches to arrays, so you can manipulate them in various fun ways using more pythonic methods if you want. See the previous method for an example of this approach.

If you are more familiar with python than you are with C/C++, you might find that using a python based root macro is easier for you. Outlined below are sample blocks of code for creating and running a python based analysis script.

With python, some tasks become easier, e.g. string manipulation and writing to (non ROOT) files.

Before we begin, we should create a skeleton macro to handle file I/O. For this example, we will make a simple python script. Using your favourite editor, create a file with a name like trackAnalysis.py or something similar and copy in the following code:

#! /usr/bin/python
         
#Import relevant packages
import ROOT, math, array                                
from ROOT import TH1F, TH2F, TMath, TTree, TVector3, TVector2
import uproot as up

# Define and open files
infile="PATH_TO_FILE" 
ofile=ROOT.TFile.Open("TrackAnalysis_OutPy.root", "RECREATE")

# Open input file and define branches we want to look at with uproot
events_tree = up.open(infile)["events"]

# Define histograms below

# Add main analysis loop(s) below

# Write output histograms to file below                

# Close files
ofile.Close()                    

Note that we are using the module uproot to access the data here. See further documentation here. You may also need some of the other included packages too.

We will use uproot a little bit like we use the TTreeReader in the other example. We can define the branches we want and assign them to arrays with uproot. We can do this via:

# Open input file and define branches we want to look at with uproot
events_tree = up.open(infile)["events"]
# Get particle information# Get particle information
partGenStat = events_tree["MCParticles.generatorStatus"].array()
partMomX = events_tree["MCP articles.momentum.x"].array() 
partMomY = events_tree["MCParticles.momentum.y"].array()
partMomZ = events_tree["MCParticles.momentum.z"].array()
partPdg = events_tree["MCParticles.PDG"].array()

# Get reconstructed track information
trackMomX = events_tree["ReconstructedChargedParticles.momentum.x"].array()
trackMomY = events_tree["ReconstructedChargedParticles.momentum.y"].array()
trackMomZ = events_tree["ReconstructedChargedParticles.momentum.z"].array()
 ...

We can then access them as an array in a loop -

# Add main analysis loop(s) below
for i in range(0, len(partGenStat)): # Loop over all events
    for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
        if partGenStat[i][j] == 1: # Select stable particles
            pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
            ...

Uproot effectively takes the information in the tree, and turns it into an array. We can then access and manipulate this array in the same way that we can with any array in python.

Note that if you are using an older version of uproot (v2.x.x), you will need to access the branches slightly differently via -

partGenStat = events_tree.array("MCParticles.generatorStatus")

You can run this file with python3 trackAnalysis.py. It should open your file and create an empty output root file as specified. We will add histograms to this script and fill them in the next step.

Note that depending upon your setup, python trackAnalysis.py may work too.

Efficiency Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

As with the ROOT TTreeReader example, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.

The basic strategy is the same:

  1. Loop over all events in the file
  2. Within each event, loop over all stable charged particles
  3. Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
  4. Create and fill the necessary histograms

Here is the sample code to implement these steps:

# Get assocations between MCParticles and ReconstructedChargedParticles
recoAssoc = events_tree["_ReconstructedChargedParticleAssociations_rec.index"].array()
simuAssoc = events_tree["_ReconstructedChargedParticleAssociations_sim.index"].array()

# Define histograms below
partEta = ROOT.TH1D("partEta","Eta of Thrown Charged Particles;Eta",100, -5 ,5 )
matchedPartEta = ROOT.TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track", 100, -5 ,5);
matchedPartTrackDeltaR = ROOT.TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charge Particle", 5000, 0, 5);

# Add main analysis loop(s) below
for i in range(0, len(partGenStat)): # Loop over all events
    for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
        if partGenStat[i][j] == 1: # Select stable particles
            pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
            if(pdg == 11 or pdg == 13 or pdg == 211 or pdg == 321 or pdg == 2212):
                trueMom = ROOT.TVector3(partMomX[i][j], partMomY[i][j], partMomZ[i][j])
                trueEta = trueMom.PseudoRapidity()
                truePhi = trueMom.Phi()
                partEta.Fill(trueEta)
                for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
                    if (simuAssoc[i][k] == j):
                        recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
                        deltaEta = trueEta - recMom.PseudoRapidity()
                        deltaPhi = TVector2. Phi_mpi_pi(truePhi - recMom.Phi())
                        deltaR = math.sqrt((deltaEta*deltaEta) + (deltaPhi*deltaPhi))
                        matchedPartEta.Fill(trueEta)
                        matchedPartTrackDeltaR.Fill(deltaR)
                        
# Write output histograms to file below
partEta.Write()
matchedPartEta.Write()
matchedPartTrackDeltaR.Write()

# Close files
ofile.Close()

Insert this block of code appropriately. We should now have everything we need to find the track efficiency as a function of pseudorapidity. Run the script with python3 trackAnalysis.py`. This should produce a root file with a few histograms in place. The efficiency can be found by taking the ratio of matchedPartEta over partEta.

Question:

  • Do the hisotgram ranges make sense?
  • We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?
  • When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?
  • Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta
  • Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle

Resolution Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Next, we will look at track momentum resolution, that is, how well the momentum of the reconstructed track matches that of the thrown particle. We should have all of the “infrastructure” we need in place to do the analysis, we just need to define the appropriate quantities and make the histograms. It only makes sense to define the resolution for tracks and particles which are associated with one another, so we will work within the loop over associations. Define the resolution expression and fill a simple histogram by inserting this block of code appropriately:

trackMomentumRes = ROOT.TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
...
                for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
                    if (simuAssoc[i][k] == j):
                        recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
                        momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag()

                        trackMomentumRes.Fill(momRes)

Remember to write this histogram to the output file too! While this plot will give us a sense of what the tracking resolution is, we don’t expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.

Exercise: For all scattered electrons, charged pions and protons in our events:

  • Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.

Question:

  • Will the histogram ranges for each particle species be the same?
  • Could we present the resolution values in a more understandable way?

ROOT RDataFrames

Note:

  • This method does actually need you to be within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).

Newer versions of root, such as the version in eic-shell, have access to a relatively new class, RDataFrames. These are similar to pythonic data frame style structures that you may be familiar with. Some people are moving towards utilising RDataFrames in their analysis. If you are more familiar with working with data frames, you may wish to investigate these further.

Included below is a quick script from Simon Gardner that utilises RDataFrames to analyse a data file. Copy the following into a new file called EfficiencyAnalysisRDF.C -

#include <edm4hep/utils/vector_utils.h>
#include <edm4hep/MCParticle.h>
#include <edm4eic/ReconstructedParticle.h>
#include <ROOT/RDataFrame.hxx>
#include <ROOT/RVec.hxx>
#include <TFile.h>

// Define aliases for the data types 
using MCP = edm4hep::MCParticleData;
using RecoP = edm4eic::ReconstructedParticleData;

// Define function to vectorize the edm4hep::utils methods
template <typename T>
auto getEta = [](ROOT::VecOps::RVec<T> momenta) {
  return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::eta(p.momentum); });
};

template <typename T>
auto getPhi = [](ROOT::VecOps::RVec<T> momenta) {
  return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::angleAzimuthal(p.momentum); });
};

// Define the function to perform the efficiency analysis
void EfficiencyAnalysisRDF(TString infile="PATH_TO_FILE"){
   
  // Set up input file 
  ROOT::RDataFrame df("events", infile);

  // Define new dataframe node with additional columns
  auto df1 =  df.Define("statusFilter",  "MCParticles.generatorStatus == 1"    )
                .Define("absPDG",        "abs(MCParticles.PDG)"                )
                .Define("pdgFilter",     "absPDG == 11 || absPDG == 13 || absPDG == 211 || absPDG == 321 || absPDG == 2212")
                .Define("particleFilter","statusFilter && pdgFilter"           )
                .Define("filtMCParts",   "MCParticles[particleFilter]"         )
                .Define("assoFilter",    "Take(particleFilter,_ReconstructedChargedParticleAssociations_simID.index)") // Incase any of the associated particles happen to not be charged
                .Define("assoMCParts",   "Take(MCParticles,_ReconstructedChargedParticleAssociations_simID.index)[assoFilter]")
                .Define("assoRecParts",  "Take(ReconstructedChargedParticles,_ReconstructedChargedParticleAssociations_recID.index)[assoFilter]")
                .Define("filtMCEta",     getEta<MCP>   , {"filtMCParts"} )
                .Define("filtMCPhi",     getPhi<MCP>   , {"filtMCParts"} )
                .Define("accoMCEta",     getEta<MCP>   , {"assoMCParts"} )
                .Define("accoMCPhi",     getPhi<MCP>   , {"assoMCParts"} )
                .Define("assoRecEta",    getEta<RecoP> , {"assoRecParts"})
                .Define("assoRecPhi",    getPhi<RecoP> , {"assoRecParts"})
                .Define("deltaR",        "ROOT::VecOps::DeltaR(assoRecEta, accoMCEta, assoRecPhi, accoMCPhi)");

  // Define histograms
  auto partEta                = df1.Histo1D({"partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.},"filtMCEta");
  auto matchedPartEta         = df1.Histo1D({"matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.},"accoMCEta");
  auto matchedPartTrackDeltaR = df1.Histo1D({"matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.},"deltaR");

  // Write histograms to file
  TFile *ofile = TFile::Open("EfficiencyAnalysis_Out_RDF.root","RECREATE");

  // Booked Define and Histo1D lazy actions are only performed here
  partEta->Write();
  matchedPartEta->Write();
  matchedPartTrackDeltaR->Write();
      
  ofile->Close(); // Close output file
}

Note:

  • You will need to run this script with the command root -l -q EfficiencyAnalysisRDF.C++, within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).
  • Remember to put in the correct file path.

If you like, you can try completing the exercises using this example to start from.

PODIO - Direct Analysis

If you want to avoid ROOT entirely, you can analyse the PODIO files directly in a variety of ways.

See Wouter’s example use cases from 23/04/24. Wouter shows a few ways in which the PODIO file can be accessed and analysed directly.

** As of March 2026, a full example and version of this method will be provided in the near future.**

Key Points

  • Flat tree structure provides flexibility in analysis.

  • The ReconstructedChargedParticles branch holds information on reconstructed tracks.


Full Chain Analysis

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How do I bring all of this together if I’m starting from scratch?

Objectives
  • Become familiar with the full analysis chain

In this short session, we’ll go through a brief run through of how we actually ended up with a file like the one we ran our script on before. There are 5 basic steps which we’ll look at individually, and then combine together:

  1. Generate an input file (typically hepmc, other formats are useable). This is usually from some external event generator.
  2. Afterburn the file and apply beam effects (might be skipped in some cases).
  3. Process the input through the simulation, DD4HEP.
  4. Reconstruct the DD4HEP output with EICrecon.
  5. Analyse the EICrecon output with analysis script.

Note that for low level analyses, you could also directly analyse the DD4HEP output from step 3. You may also wish to consult Holly’s slides from the April 2024 software meeting for an overview of each of these steps and how they fit into this production chain.

Event Generator Input Files

I won’t say too much on this since this strongly depends upon the channel you want to simulate and analyse. The files here will likely come from an external event generator, for example -

… and may others. However, regardless of what you use, the output is likely some form of .hepmc file with event by event particle/vertex info. For example -

Example HEPMC Event: An example event from a HEPMC file is shown below. In this example event, we have an input 5 GeV electron on a 41 GeV proton. We have one vertex and three outgoing particles, a scattered electron, a pion, and a neutron. In our header, we also have an event weight included.

E 1 1 5 U GEV MM A 0 weight 4.813926604168258e-07 P 1 0 11 6.123233963758798e-16 0.000000000000000e+00 -4.999999973888007e+00 5.000000000000000e+00 5.109989488070365e-04 4 P 2 0 2212 -0.000000000000000e+00 -0.000000000000000e+00 4.100000000000000e+01 4.101073462535657e+01 9.382720881600054e-01 4 V -1 0 [1,2] P 3 -1 11 -6.872312444834133e-01 1.924351128807063e+00 -4.281657822517654e+00 4.744260534644128e+00 5.109989488070365e-04 1 P 4 -1 211 1.042011265882083e+00 -1.600831989262599e+00 1.404460452649878e+00 2.374960954263115e+00 1.395701800000037e-01 1 P 5 -1 2112 -3.547800213986697e-01 -3.235191395444645e-01 3.887719739597977e+01 3.889151313644933e+01 9.395654204998098e-01 1

Typically, we also need to incorporate beam effects. This is done via the use of the afterburner.

Applying Beam Effects - Afterburner

Afterburner applies beam effects to an existing hepmc file. These include effects due to the crabbing of the beam bunches and the crossing angle. Afterburner is pre-installed in eic-shell. We can run it via -

abconv

However, we’ll need an input file to do anything, we can also check other options quickly with -

abconv -h

Note that when we run Afterburner, it will try to pick up the input beam energies and apply the relevant configuration. We can force a different configuration if we want (see the options from the help printout). We could for example though run -

abconv $File -o $OutputFilename

where $File is our input hepmc file from our generator, and $OutputFilename is whatever we want our output to be called.

Regardless of whether we want or need to afterburn the file, we can feed in our hepmc file to DD4HEP and process our events through the detector simulation.

Simulation

To process our events through the simulation, we need to get the detector geometry. The simplest way is simply to source the nightly build within eic-shell -

./eic-shell
source /opt/detector/epic-main/bin/thisepic.sh

We can check this worked as intended by checking that the DETECTOR_PATH variable is now set. Do so via -

ls $DETECTOR_PATH

If we do this without sourcing thisepic.sh, we should get an error. Now, we should see a range of .xml files (which outline various detector configurations). We could also compile our own version of the detector within eic-shell. You might want to do so if you are actively iterating on the design of specific detector for example. See the GitHub page for instructions.

We can now process a simulation. Be aware that this may take some time, so to test it, try processing a small number of events first. Check the options we can provide via -

npsim -h

A typical simulation command might look something like -

npsim --compactFile $DETECTOR_PATH/epic_craterlake.xml --numberOfEvents 1000 --inputFiles input.hepmc --outputFile output.edm4hep.root

Most of the arguments are pretty self explanatory. As a quick demo, I’ll run -

npsim --compactFile $DETECTOR_PATH/epic_craterlake_5x41.xml --numberOfEvents 10 --inputFiles eic_DEMPgen_5on41_ip6_pi+_1B_1.hepmc --outputFile DEMPgen_5on41_pi+_10_TestOutput.edm4hep.root

When we run this, we’ll get lots of printouts to screen, we can supress this by adding the -v5 argument too (only errors will be printed). Once we have our simulation output, we can now reconstruct our events.

Reconstruction

We can run eicrecon pretty straightforwardly, within eicshell, try -

eicrecon -h

which should again, print out the various options we have available. An example command to run the reconstruction on a file might look like this -

eicrecon -Ppodio:output_file=eicrecon_out.root -Pjana:nevents=1000 -Pdd4hep:xml_files=epic_craterlake.xml sim_output.edm4hep.root

Again, this might take a long time. So test a small sample of events first. Following up on my simulation demo, I’ll run -

eicrecon -Ppodio:output_file=DEMPgen_5on41_pi+_10_TestReconOutput.edm4hep.root -Pjana:nevents=10 -Pdd4hep:xml_files=epic_craterlake_5x41.xml DEMPgen_5on41_pi+_10_TestOutput.edm4hep.root

eicrecon will look for the detector .xml file in $DETECTOR_PATH, so make sure the detector geometry is sourced before running eicrecon.

Combining Everything

Ok, great. We now have a file we could run our earlier analysis script on. But what if we wanted to do all of this from scratch? Well, the easiest way might be to put all of this in a shell script. So, pulling all of our commands together -

#! /bin/bash

source /opt/detector/epic-main/bin/thisepic.sh
eval npsim --compactFile $DETECTOR_PATH/epic_craterlake.xml --numberOfEvents 1000 --inputFiles input.hepmc --outputFile output.edm4hep.root
sleep 3
eval eicrecon -Ppodio:output_file=eicrecon_out.root -Pjana:nevents=1000 -Pdd4hep:xml_files=epic_craterlake.xml sim_output.edm4hep.root

exit 0

I’ve premade a version of this with the commands I ran earlier, so we can run it and see what happens.

Farming

Ok, great. We can do (almost) all of the processes we need in one command. But as we’ve seen, the processing can take a while. Realistically, we’re probably going to want to parallelise this in some way. With access to the JLab iFarm or the BNL systems (Condor). We can create and submit compute jobs for this purpose. This is getting a bit beyond the scope of this tutorial, but some things to consider -

For our first point, one easy (and not recommended for Condor jobs!) way to do this is via an EOF line -

#! /bin/bash

cat <<EOF | eic-shell
./Basic_Bash.sh
EOF

exit 0

This just starts eic-shell and runs our earlier script. We can run this WITHOUT running eic-shell first. Note that this is a bit of a cheat to address a pathing issue. $DETECTOR_PATH will be interpreted by the script BEFORE the EOF script so our variable will be mis-set. We can get around this by running a script. Ideally, for our compute job, we should probably also explicitly set our paths to eic-shell and the bash script in some way.

With changes like this made, we could then make a quick job and submit it. This is a bit beyond the tutorials, but for some farm examples, see this job script and this job submission script I use as a template. Feel free to use these as a template for your own jobs, but please thoroughly read through and understand them before submitting a huge number of jobs. Keep the comments above in mind too.

Also a quick disclaimer, my experience in running jobs is limited to systems I know (which does not include the BNL systems). As such, I can’t advise much beyond general slurm job style questions on BNL/Condor. I’m also aware that using EOF in scripts was not encouraged in BNL jobs, see this discussion on mattermost for more.

Warnings

Finally, a major disclaimer. A lot of the time, you should NOT be starting from scratch and processing through the simulation and reconstruction yourself. There are numerous reasons -

Where possible, use files from official simulation campaigns (bringing us full circle, see the first lesson for using a simulation campaign file in a script!). That being said, for testing and iterating rapidly on a design change, running small jobs yourself may be the way to go. It may also help you to understand the full process by seeing the steps involved.

Key Points

  • There are a few steps to go through before we get to the file we analysed previously.

  • Good for testing, but use simulation campaign output where possible!.


Additional Resources

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Where can I access additional resources?

Objectives
  • Check out the additional resources under the Extras tab!

Check the “Extras” tab at the top of the page for some additional resources. In particular, check out the Exercise Scripts link for some skeletons/templates for the scripts above to get you started. I’ll also add “completed” exercise scripts a bit later on.

The Examples Repository link goes to a repository of analysis scripts/codes currently in use. You might want to refer to this as you get started on your own project if you are struggling to get off the ground. Or, it may just serve as an additional reference point.

Key Points

  • Starting scripts for the exercises are available.

  • Completed (tm) exercise scripts will be available after the lesson.

  • Talk to your colleagues in your WG if you don’t know where to get started. Check the Examples Repository too!