Introduction

Overview

Teaching: 15 min
Exercises: 5 min

Questions

How do I locate and access the simulation output?

Objectives

Understand how the simulation output is organized

Know how to access the simulation output using Jefferson Lab xrootd

More detailed information on the simulation productions, including the information presented below, can be found on the Simulation Production Campaign Website.

Simulation Files Organization

There are three broad classes of files stored on xrootd, each in their own directory:

EVGEN: The input hepmc3 datasets
- E.g. some files that have been supplied by a physics event generator
FULL: The full GEANT4 output root files (usually only saved for a fraction of runs)
- If running a simulation yourself, this would be your output from processing npsim
RECO: The output root files from the reconstruction
- And again, if running yourself, this would be your output from EICrecon (after you’ve used your awesome new reconstruction algorithm from the later tutorial of course)

Most users will interact with the files in the RECO directory and that is what we will focus on in this tutorial. Within the RECO directory, files are organized by campaign (25.01.1 for the January 2025 campaign, for example), detector configuration and then physics process. Each physics process will have different sub directories, for example generator version, energy, or Q2. The directory structure and number of reconstructed files for each campaign can be found on the Simulation Website here.

Access Simulation from Jefferson Lab xrootd

The prefered method for browsing the simulation output is to use xrootd from within the eic-shell. To browse the directory structure and exit, one can run the commands:

./eic-shell
xrdfs root://dtn-eic.jlab.org
ls /volatile/eic/EPIC/RECO/25.01.1
exit

It is also possible to copy a file and open it locally using the xrdcp command:

./eic-shell
xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/path-to-file .
exit

Files can also be coppied locally by replacing ls with cp.

Streaming Files

It is also possible to open a file directly in ROOT. Note that the following command should be executed after opening root and TFile::Open() should be used:

auto f = TFile::Open("root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/path-to-file")

Reminder - Download a file for the next step!

We will need a file to analyse going forward, if you have not done so, download a file now!

Grab a file from -

/volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/

For example -

xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_5.0001.eicrecon.tree.edm4eic.root ./

Note that the ./ at the end is the target location to copy to. Change this as desired.

Note that we can also specify a different filename to copy to as we could with a normal cp command. You might want to do this as the filename is a little cumbersome. I called mine NC_DIS_18x275_JanCampaign.root, just replace ./ with your file name of choice.

You can also stream the file if you prefer, just copy the path of the file above. You will need to modify the scripts later in the tutorial accordingly to account for this.

Advanced Use Case - Grabbing a whole bunch of files

I won’t go through this in the tutorial, but this may be something you want to come back to as you get deeper into writing and using your own analysis code. This advanced use case involves copying/using a large number of processed files. Something you might want to do once your analysis is out of the testing phase and onto the “Let’s process ALL of the data!” stage.

If you’re moving a lot of files around, you might normally resort to using a wildcard -

cp File* My_Folder/

or similar. However, with xrdcp, this isn’t so trivial. Some methods to test and try are include below.

where here we’re finding things in the given path that match the name pattern provided, and copying them to our current directory.

Alternatively, you could grab a list of the files you want and pipe them to a file -

xrdfs root://dtn-eic.jlab.org ls /volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10 | sed 's|^|root://dtn-eic.jlab.org/|g' > list.txt

In this case, we’re listing all files on the server in that path, piping them to sed and inserting “root://dtn-eic.jlab.org/” at the front and then feeding the output to the file “list.txt”.

more list.txt
root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0000.eicrecon.tree.edm4eic.root
root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0001.eicrecon.tree.edm4eic.root
...

We could then, for example, feed this list to a TChain -

TChain events("events")
std::ifstream in("list.txt")
std::string file("")
while (in >> file) events.Add(file.data())
events.Scan("@MCParticles.size()","","",10)

Where in the final line we’re only going to skim over the first 10 events.

It should be noted that the best solution may just be to run the files from the server, rather than copying them to somewhere else and running them there.

Key Points

Use xrdfs from within the eic-shell to browse available files from simulations campaigns.

Use xrdcp from within eic-shell to copy files to your local environment.

Within eic-shell, you can also stream files directly in your root macros.

The Reconstruction Output Tree

Overview

Teaching: 20 min
Exercises: 20 min

Questions

What information is stored in the reconstruction output and how can we access it?

Objectives

Become familiar with the tree branches

What we generally call the simulation output are root trees generated by the reconstruction software, EICrecon. The branches which appear in the output trees and their content are determined by which EICrecon factories and algorithms are run.

If you’re trying to open interactive windows such as a TBrowser via X-forwarding, it’s likely to be very slow. You may wish to just copy the file to your local machine and open it there.
- The webviewer may also be quicker (just keep in mind that it’s less usable and you will struggle to do more than just browsing the tree).
  - You can’t do much more than view the files with this either. Ultimately, for the final part of the tutorial, you’ll need to be able to execute a ROOT macro/script it some way.

EICrecon Output Tree Structure

The output tree contains various branches appropriate for the individual detector subsystems. For example, the calorimeter subsystems will have branches for reconstructed hits and clusters. In addition to individual subsystem information, there are also branches for reconstructed quantities (e.g. reconstructed particles, inclusive kinematics, and jets) which may combine information from several subsystems. There are also branches encoding relationships between different reconstructed quantities as well as reconstructed and truth quantities.

Exercise

Stream a simulation output tree from within root (see previous lesson) and browse the structure by calling new TBrowser()

Take some time to explore the branches of the tree. What information is included for various subsystems? What are some of the reconstructed quantities?

Try plotting some basic quantities. Plot the cluster energy of the negative endcap ECal - do you see the peak from the scattered electron?

For the last part, some extra tips are included below.

Browsing the file - ROOT Tips

We can navigate around the TBrowser and get it to draw quantities by simply double clicking them. However, sometimes the auto binning can be off or we might want to look at a specific range. We can do this to some extent directly from the TBrowser.

Alternatively, we can also plot variables to histograms of our own choosing on the fly. We can do this via -

root $FILE
events->Draw("QUANTITY")

where $FILE is the file we want to open and “QUANTITY” is the thing we want to draw. For example -

events->Draw("MCParticles.momentum.z")

will draw the MCParticles.momentum.z branch. So far, so much like just double clicking the TBrowser. However, we could define a new histogram and fill this variable to it -

events->Draw("MCParticles.momentum.z>>h1(100,0,100)")

where h1 is our new histogram. This has 100 bins from 0 to 100. We can also apply selection criteria (i.e. cuts) on the fly. These do not need to be on the same variable we are drawing! For example -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0")

will fill our histogram only with negatively charged particles. We can add more conditions if we want -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0 && MCParticles.mass>1")

where we now also require that the particle mass is >1.

We can also add drawing options -

events->Draw("MCParticles.momentum.z>>h1(360,-60,300)", "MCParticles.charge<0", "HISTERR")

such as adding error bars. We can also make 2D histograms in the same way -

events->Draw("MCParticles.momentum.z:MCParticles.charge>>h2(40,-2,2, 360,-60,300)", "", "COLZ")

note the order of defining the binning. It’s not what you might expect. Also, to interpret the result (as is often the case with 2D histograms), you might want to set a log Z scale -

gPad->SetLogz()

Which branches do I need?

As you have probably seen by now, the tree in our file contains a lot of information!

For the rest of the tutorial, we will be focusing on a relatively small subset. This is generally true with most analyses, it’s rare we’ll be retaining and looking at every branch. A brief (and incomplete!) dictionary of commonly used branches, including those we’ll need in this tutorial, is incldued in the “Extras” section. See the tab at the top of the page or follow this link.

The MCParticles Record

In the last section, the example quantities we were plotting involved the MCparticles branch. Nearly every analysis will include some comparison to the truth level, so it is important to understand how to access generator level information. Particles produced by the Monte Carlo Event Generator and during the interaction of these primary particles with the detector material as modeled by GEANT are stored in the MCParticles branch, whoes structure is defined by the datatype edm4hep::MCParticle. The particle’s PDG code, charge, production time, mass, production vertex, endpoint, momentum at the production vertex, and momentum at the endpoint are all available. In addition, the status of the particle as defined by the event generator and the detector simulation are stored. For example, if one wanted to look at stable particles from the event generator, they would require MCParticles.generatorStatus == 1. The field MCParticles.simulatorStatus is a bit-field which encodes some information on how the particle propagated through the detector. The detailed definition of the bit assignments can be found in the edm4hep yaml file.

How is the tree populated?

You may wonder how the specific branches of the tree have actually been populated. As this is the output after EICrecon, the trees in the file are those specified as EICrecon was run by the algorithms and factories that were enabled at run time.

This is covered in more detail in the Understanding the Simulation Output tutorial. In particular, Episode 3 of that tutorial details how you can identify the sequence of algorithms utilised to generate a specific output branch. In the case of the example in this tutorial, the track reconstruction is detailed.

Work through the algorithms and factories tutorial. Afterwards, take a closer look at this file. See if you can figure out which algorithm or factory was responsible for creating some of the included branches in this file.

Key Points

Output trees contain a lot of information. Take time to explore what is available, identify what you want to try and do, find the relevant branches.

The MCParticles branch holds information on generator level particles, critical for use in comparing to what we actually detect!

Analyzing the Reconstruction Output

Overview

Teaching: 20 min
Exercises: 40 min

Questions

How does one utilize the reconstruction output trees to do an analysis?

Objectives

Become familiar with methods for reading the trees

Understand how to access truth/particle information

Find track efficiency and resolution

So far, we have only looked at (and plotted) some information from our file interactively. This is very useful and can help us identify the variables we want to deal with. However, we can’t really use these techniques to conduct a full analysis of the data. To do so, we typically use a script or macro. In this part of the tutorial, we will create a script that we can use to do a relatively straightforward analysis of our file.

Note:

The branch dictionary outlines all of the branches we will need to utilise in this section.

If you want, you can prune the branches you don’t need from the input file using the TreePrune.C script.

Reading the Output Trees

The simulation output trees are “flat” in the sense that there is no event class structure embedded within the tree and no additional libraries are needed to handle the output. Therefore, the end user can simply read the values stored in each branch using whatever method/workflow they are most comfortable with. Examples of several common methods for reading the trees are provided below. We will see a ROOT TTreeReader based example using a ROOT macro and a python/uproot based version. There is also an example using the (relatively) new RDataFrame class of ROOT. During the tutorial, you should try the exercise using whichever language you feel most comfortable with.

Sample Analysis with ROOT TTreeReader: Track Efficiency and Resolution

As a sample exercise to become familiar with the simulation output and how to use it in a realistic setting, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.

Before we begin, we should create a skeleton macro to handle file I/O. For the TTreeReader example, we will use a simple ROOT macro. Using your favorite text editor, create a file with a name like trackAnalysis.C or something similar and copy in the following code:

void trackAnalysis(TString infile="path_to_your_simu_file")
  {
    // Set output file for the histograms
    TFile *ofile = TFile::Open("out.hist.root","RECREATE");

    // Analysis code will go here

    ofile->Write(); // Write histograms to file
    ofile->Close(); // Close output file
  }

We will need momentum, generator status, and particle species information for the truth particles and momentum information for the reconstructed tracks. The reconstructed track information can be accessed from two different branches: CentralCKFTrackParameters and ReconstructedChargedParticles. We can access these branches using a TTreeReaderArray.

ROOT TTreeReaderArrays:

TTreeReader and the associated TTreeReaderArray is a simple interface for reading data from a TTree. The class description and examples can be seen here. To instantiate the reader and access values from a given branch (e.g. the MCParticles branch), one would use the following calls:
// Set up input file chain
TChain *mychain = new TChain("events");
mychain->Add(infile);

// Initialize reader
TTreeReader tree_reader(mychain);

// Access whatever data-members you need
TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
TTreeReaderArray<float> partMomX(tree_reader, "MCParticles.momentum.x");
...
The branches and their members can be viewed by opening a file with TBrowser (new TBrowser()) from within ROOT. Once you have defined the TTreeReaderArray objects for the data-members you want to look at, you can loop over the events and the members within that event:
while(tree_reader.Next()) { // Loop over events
 for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop through particles in the event
   {
     int particleStatus = partGenStat[i]; // Access data-members as you would an array
     float particleXMomentum = partMomX[i]; // partMomX should have same number of entries as partGenStat because they are in the same branch
     ...
   }
}
All members of the same branch should have the same number of entries, so it is sufficient to use any member of the branch to set the limit of your loop.

We will proceed using the ReconstructedChargedParticles branch as this will give us a chance to practice using associations, copy the following lines into your analysis macro.

// Set up input file chain
TChain *mychain = new TChain("events");
mychain->Add(infile);

// Initialize reader
TTreeReader tree_reader(mychain);

// Get Particle Information
TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
TTreeReaderArray<float> partMomX(tree_reader, "MCParticles.momentum.x");
TTreeReaderArray<float> partMomY(tree_reader, "MCParticles.momentum.y");
TTreeReaderArray<float> partMomZ(tree_reader, "MCParticles.momentum.z");
TTreeReaderArray<int> partPdg(tree_reader, "MCParticles.PDG");

// Get Reconstructed Track Information
TTreeReaderArray<float> trackMomX(tree_reader, "ReconstructedChargedParticles.momentum.x");
TTreeReaderArray<float> trackMomY(tree_reader, "ReconstructedChargedParticles.momentum.y");
TTreeReaderArray<float> trackMomZ(tree_reader, "ReconstructedChargedParticles.momentum.z");

// Get Associations Between MCParticles and ReconstructedChargedParticles
TTreeReaderArray<unsigned int> recoAssoc(tree_reader, "ReconstructedChargedParticleAssociations.recID");
TTreeReaderArray<unsigned int> simuAssoc(tree_reader, "ReconstructedChargedParticleAssociations.simID");

The last two lines encode the association between a ReconstructedChargedParticle and a MCParticle where the matching is determined in the ParticlesWithPID algorithm which generates the ReconstructedChargedParticle objects.

Compiling ROOT Macros:

If you are analysing a large number of events, you may wish to compile your macro to increase throughput. An example of how you can create and compile a root macro is included in the Exercise Scripts section

Efficiency Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Now that we have access to the data we need we will begin constructing our efficiency plots, starting with efficiency as a function of the true particle pseudorapidity. The basic strategy is outlined below:

Loop over all events in the file
Within each event, loop over all stable charged particles
Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
Create and fill the necessary histograms

Here is the code to implement these steps:

// Define Histograms
TH1D *partEta = new TH1D("partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.);
TH1D *matchedPartEta = new TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.);

TH1D *matchedPartTrackDeltaR = new TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.);

while(tree_reader.Next()) { // Loop over events

  for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop over thrown particles
    {
	    if(partGenStat[i] == 1) // Select stable thrown particles
	      {
	        int pdg = TMath::Abs(partPdg[i]);

	        if(pdg == 11 || pdg == 13 || pdg == 211 || pdg == 321 || pdg == 2212) // Look at charged particles (electrons, muons, pions, kaons, protons)
	          {
		          TVector3 trueMom(partMomX[i],partMomY[i],partMomZ[i]);

		          float trueEta = trueMom.PseudoRapidity();
		          float truePhi = trueMom.Phi();
	    
		          partEta->Fill(trueEta);

              // Loop over associations to find matching ReconstructedChargedParticle
		          for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
		            {
		              if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
		                {
			                TVector3 recMom(trackMomX[recoAssoc[j]],trackMomY[recoAssoc[j]],trackMomZ[recoAssoc[j]]); // recoAssoc[j] is the index of the matched ReconstructedChargedParticle

                      // Check the distance between the thrown and reconstructed particle
			                float deltaEta = trueEta - recMom.PseudoRapidity();
			                float deltaPhi = TVector2::Phi_mpi_pi(truePhi - recMom.Phi());
			                float deltaR = TMath::Sqrt(deltaEta*deltaEta + deltaPhi*deltaPhi);

			                matchedPartTrackDeltaR->Fill(deltaR);

			                matchedPartEta->Fill(trueEta); // Plot the thrown eta if a matched ReconstructedChargedParticle was found
                    }
                }
            }            
        }
    }
}

We should now have everything we need to find the track efficiency as a function of pseudorapidity. To run the macro and produce an output file containing the histograms we defined, simply type root -l -q trackAnalysis.C. After the macro runs, you can open the root file to inspect the histograms. The efficiency can be found by taking the ratio of matchedPartEta over partEta.

Question:

Do the histogram ranges make sense?

We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?

When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?

Exercise:

Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?

Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta

Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle

Resolution Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

Next, we will look at track momentum resolution, that is, how well the momentum of the reconstructed track matches that of the thrown particle. We should have all of the “infrastructure” we need in place to do the analysis, we just need to define the appropriate quantities and make the histograms. It only makes sense to define the resolution for tracks and particles which are associated with one another, so we will work within the loop over associations. Define the resolution expression and fill a simple histogram:

TH1D *trackMomentumRes = new TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
...
// Loop over associations to find matching ReconstructedChargedParticle
for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
  {
    if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
      {
        ...
        double momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag();

        trackMomentumRes->Fill(momRes);
      }
  }  

While this plot will give us a sense of what the tracking resolution is, we don’t expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.

Exercise:

Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.

Break resolution plots down by particle species.

Question:

Will the histogram ranges for each particle species be the same?

Could we present the resolution values in a more understandable way?

Sample Analysis with Python/uproot: Track Efficiency and Resolution

Comment: Despite using python/uproot, I have written these in a very “ROOT”/C way. Uproot converts our branches to arrays, so you can manipulate them in various fun ways using more pythonic methods if you want. Shujie Li utilises a more pythonic approach in the Understanding the Simulation Output tutorial. You can see some examples of this in the Jupyter Notebook Shujie set up for that tutorial.

If you are more familiar with python than you are with C/C++, you might find that using a python based root macro is easier for you. Outlined below are sample blocks of code for creating and running a python based analysis script.

With python, some tasks become easier, e.g. string manipulation and writing to (non ROOT) files.

Before we begin, we should create a skeleton macro to handle file I/O. For this example, we will make a simple python script. Using your favorite editor, create a file with a name like trackAnalysis.py or something similar and copy in the following code:

#! /usr/bin/python
         
#Import relevant packages
import ROOT, math, array                                
from ROOT import TH1F, TH2F, TMath, TTree, TVector3, TVector2
import uproot as up

# Define and open files
infile="PATH_TO_FILE" 
ofile=ROOT.TFile.Open("TrackAnalysis_OutPy.root", "RECREATE")

# Open input file and define branches we want to look at with uproot
events_tree = up.open(infile)["events"]

# Define histograms below

# Add main analysis loop(s) below

# Write output histograms to file below                

# Close files
ofile.Close()                    

Note that we are using the module uproot to access the data here. See further documentation here. You may also need some of the other included packages too.

We will use uproot a little bit like we use the TTreeReader in the other example. We can define the branches we want and assign them to arrays with uproot. We can do this via:

# Open input file and define branches we want to look at with uproot
events_tree = up.open(infile)["events"]
# Get particle information# Get particle information
partGenStat = events_tree["MCParticles.generatorStatus"].array()
partMomX = events_tree["MCP articles.momentum.x"].array() 
partMomY = events_tree["MCParticles.momentum.y"].array()
partMomZ = events_tree["MCParticles.momentum.z"].array()
partPdg = events_tree["MCParticles.PDG"].array()

# Get reconstructed track information
trackMomX = events_tree["ReconstructedChargedParticles.momentum.x"].array()
trackMomY = events_tree["ReconstructedChargedParticles.momentum.y"].array()
trackMomZ = events_tree["ReconstructedChargedParticles.momentum.z"].array()
 ...

We can then access them as an array in a loop -

# Add main analysis loop(s) below
for i in range(0, len(partGenStat)): # Loop over all events
    for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
        if partGenStat[i][j] == 1: # Select stable particles
            pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
            ...

Uproot effectively takes the information in the tree, and turns it into an array. We can then acces and manipulate this array in the same way that we can with any array in python.

Note that if you are using an older version of uproot (v2.x.x), you will need to access the branches slightly differently via -

partGenStat = events_tree.array("MCParticles.generatorStatus")

You can run this file with python3 trackAnalysis.py. It should open your file and create an empty output root file as specified. We will add histograms to this script and fill them in the next step.

Note that depending upon your setup, python trackAnalysis.py may work too.

Efficiency Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

As with the ROOT TTreeReader example, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.

The basic strategy is the same:

Loop over all events in the file
Within each event, loop over all stable charged particles
Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
Create and fill the necessary histograms

Here is the sample code to implement these steps:

# Get assocations between MCParticles and ReconstructedChargedParticles
recoAssoc = events_tree["ReconstructedChargedParticleAssociations.recID"].array()
simuAssoc = events_tree["ReconstructedChargedParticleAssociations.simID"].array()

# Define histograms below
partEta = ROOT.TH1D("partEta","Eta of Thrown Charged Particles;Eta",100, -5 ,5 )
matchedPartEta = ROOT.TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track", 100, -5 ,5);
matchedPartTrackDeltaR = ROOT.TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charge Particle", 5000, 0, 5);

# Add main analysis loop(s) below
for i in range(0, len(partGenStat)): # Loop over all events
    for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
        if partGenStat[i][j] == 1: # Select stable particles
            pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
            if(pdg == 11 or pdg == 13 or pdg == 211 or pdg == 321 or pdg == 2212):
                trueMom = ROOT.TVector3(partMomX[i][j], partMomY[i][j], partMomZ[i][j])
                trueEta = trueMom.PseudoRapidity()
                truePhi = trueMom.Phi()
                partEta.Fill(trueEta)
                for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
                    if (simuAssoc[i][k] == j):
                        recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
                        deltaEta = trueEta - recMom.PseudoRapidity()
                        deltaPhi = TVector2. Phi_mpi_pi(truePhi - recMom.Phi())
                        deltaR = math.sqrt((deltaEta*deltaEta) + (deltaPhi*deltaPhi))
                        matchedPartEta.Fill(trueEta)
                        matchedPartTrackDeltaR.Fill(deltaR)
                        
# Write output histograms to file below
partEta.Write()
matchedPartEta.Write()
matchedPartTrackDeltaR.Write()

# Close files
ofile.Close()

Insert this block of code appropriately. We should now have everything we need to find the track efficiency as a function of pseudorapidity. Run the script with python3 trackAnalysis.py`. This should produce a root file with a few histograms in place. The efficiency can be found by taking the ratio of matchedPartEta over partEta.

Question:

Do the hisotgram ranges make sense?

We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?

When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?

Exercise:

Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?

Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta

Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle

Resolution Analysis

Hint: Refer to the script template if you’re having trouble putting things in the right place.

trackMomentumRes = ROOT.TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
...
                for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
                    if (simuAssoc[i][k] == j):
                        recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
                        momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag()

                        trackMomentumRes.Fill(momRes)

Remember to write this histogram to the output file too! While this plot will give us a sense of what the tracking resolution is, we don’t expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.

Exercise:

Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.

Break resolution plots down by particle species.

Question:

Will the histogram ranges for each particle species be the same?

Could we present the resolution values in a more understandable way?

ROOT RDataFrames

Note:

This method does actually need you to be within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).

Newer versions of root, such as the version in eic-shell, have access to a relatively new class, RDataFrames. These are similar to pythonic data frame style strucutres that you may be familiar with. Some people are moving towards utilising RDataFrames in their analysis. If you are more familiar with working with data frames, you may wish to investigate these further.

Included below is a quick script from Simon Gardner that utilises RDataFrames to analyse a data file. Copy the following into a new file called EfficiencyAnalysisRDF.C -

#include <edm4hep/utils/vector_utils.h>
#include <edm4hep/MCParticle.h>
#include <edm4eic/ReconstructedParticle.h>
#include <ROOT/RDataFrame.hxx>
#include <ROOT/RVec.hxx>
#include <TFile.h>

// Define aliases for the data types 
using MCP = edm4hep::MCParticleData;
using RecoP = edm4eic::ReconstructedParticleData;

// Define function to vectorize the edm4hep::utils methods
template <typename T>
auto getEta = [](ROOT::VecOps::RVec<T> momenta) {
  return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::eta(p.momentum); });
};

template <typename T>
auto getPhi = [](ROOT::VecOps::RVec<T> momenta) {
  return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::angleAzimuthal(p.momentum); });
};

// Define the function to perform the efficiency analysis
void EfficiencyAnalysisRDF(TString infile="PATH_TO_FILE"){
   
  // Set up input file 
  ROOT::RDataFrame df("events", infile);

  // Define new dataframe node with additional columns
  auto df1 =  df.Define("statusFilter",  "MCParticles.generatorStatus == 1"    )
                .Define("absPDG",        "abs(MCParticles.PDG)"                )
                .Define("pdgFilter",     "absPDG == 11 || absPDG == 13 || absPDG == 211 || absPDG == 321 || absPDG == 2212")
                .Define("particleFilter","statusFilter && pdgFilter"           )
                .Define("filtMCParts",   "MCParticles[particleFilter]"         )
                .Define("assoFilter",    "Take(particleFilter,ReconstructedChargedParticleAssociations.simID)") // Incase any of the associated particles happen to not be charged
                .Define("assoMCParts",   "Take(MCParticles,ReconstructedChargedParticleAssociations.simID)[assoFilter]")
                .Define("assoRecParts",  "Take(ReconstructedChargedParticles,ReconstructedChargedParticleAssociations.recID)[assoFilter]")
                .Define("filtMCEta",     getEta<MCP>   , {"filtMCParts"} )
                .Define("filtMCPhi",     getPhi<MCP>   , {"filtMCParts"} )
                .Define("accoMCEta",     getEta<MCP>   , {"assoMCParts"} )
                .Define("accoMCPhi",     getPhi<MCP>   , {"assoMCParts"} )
                .Define("assoRecEta",    getEta<RecoP> , {"assoRecParts"})
                .Define("assoRecPhi",    getPhi<RecoP> , {"assoRecParts"})
                .Define("deltaR",        "ROOT::VecOps::DeltaR(assoRecEta, accoMCEta, assoRecPhi, accoMCPhi)");

  // Define histograms
  auto partEta                = df1.Histo1D({"partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.},"filtMCEta");
  auto matchedPartEta         = df1.Histo1D({"matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.},"accoMCEta");
  auto matchedPartTrackDeltaR = df1.Histo1D({"matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.},"deltaR");

  // Write histograms to file
  TFile *ofile = TFile::Open("EfficiencyAnalysis_Out_RDF.root","RECREATE");

  // Booked Define and Histo1D lazy actions are only performed here
  partEta->Write();
  matchedPartEta->Write();
  matchedPartTrackDeltaR->Write();
      
  ofile->Close(); // Close output file
}

Note:

You will need to run this script with the command root -l -q EfficiencyAnalysisRDF.C++, within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).

Remember to put in the correct file path.

If you like, you can try completing the exercises using this example to start from.

PODIO - Direct Analysis

If you want to avoid ROOT entirely, you can analyse the PODIO files directly in a variety of ways.

See Wouter’s example use cases from 23/04/24. Wouter shows a few ways in which the PODIO file can be accssed and analysed directly.

Key Points

Flat tree structure provides flexibility in analysis.

The ReconstructedChargedParticles branch holds information on reconstructed tracks.

Full Chain Analysis

Overview

Teaching: 15 min
Exercises: 10 min

Questions

How do I bring all of this together if I’m starting from scratch?

Objectives

Become familiar with the full analysis chain

In this short session, we’ll go through a brief run through of how we actually ended up with a file like the one we ran our script on before. There are 5 basic steps which we’ll look at individually, and then combine together:

Generate an input file (typically hepmc, other formats are useable). This is usually from some external event generator.
Afterburn the file and apply beam effects (might be skipped in some cases).
Process the input through the simulation, DD4HEP.
Reconstruct the DD4HEP output with EICrecon.
Analyse the EICrecon output with analysis script.

Note that for low level analyses, you could also directly analyse the DD4HEP output from step 3. You may also wish to consult Holly’s slides from the April 2024 software meeting for an overview of each of these steps and how they fit into this production chain.

Event Generator Input Files

I won’t say too much on this since this strongly depends upon the channel you want to simulate and analyse. The files here will likely come from an external event generator, for example -

PYTHIA
BeAGLE
DJANGOH
MILOU
eSTARlight
LaGER
DEMPgen
elSPectro

… and may others. However, regardless of what you use, the output is likely some form of .hepmc file with event by event particle/vertex info. For example -

Example HEPMC Event: An example event from a HEPMC file is shown below. In this example event, we have an input 5 GeV electron on a 41 GeV proton. We have one vertex and three outgoing particles, a scattered electron, a pion, and a neutron. In our header, we also have an event weight included.

E 1 1 5 U GEV MM A 0 weight 4.813926604168258e-07 P 1 0 11 6.123233963758798e-16 0.000000000000000e+00 -4.999999973888007e+00 5.000000000000000e+00 5.109989488070365e-04 4 P 2 0 2212 -0.000000000000000e+00 -0.000000000000000e+00 4.100000000000000e+01 4.101073462535657e+01 9.382720881600054e-01 4 V -1 0 [1,2] P 3 -1 11 -6.872312444834133e-01 1.924351128807063e+00 -4.281657822517654e+00 4.744260534644128e+00 5.109989488070365e-04 1 P 4 -1 211 1.042011265882083e+00 -1.600831989262599e+00 1.404460452649878e+00 2.374960954263115e+00 1.395701800000037e-01 1 P 5 -1 2112 -3.547800213986697e-01 -3.235191395444645e-01 3.887719739597977e+01 3.889151313644933e+01 9.395654204998098e-01 1

Typically, we also need to incorporate beam effects. This is done via the use of the afterburner.

Applying Beam Effects - Afterburner

Afterburner applies beam effects to an existing hepmc file. These include effects due to the crabbing of the beam bunches and the crossing angle. Afterburner is pre-installed in eic-shell. We can run it via -

abconv

However, we’ll need an input file to do anything, we can also check other options quickly with -

abconv -h

Note that when we run Afterburner, it will try to pick up the input beam energies and apply the relevant configuration. We can force a different configuration if we want (see the options from the help printout). We could for example though run -

abconv $File -o $OutputFilename

where $File is our input hepmc file from our generator, and $OutputFilename is whatever we want our output to be called.

Regardless of whether we want or need to afterburn the file, we can feed in our hepmc file to DD4HEP and process our events through the detector simulation.

Simulation

To process our events through the simulation, we need to get the detector geometry. The simplest way is simply to source the nightly build within eic-shell -

./eic-shell
source /opt/detector/epic-main/bin/thisepic.sh

We can check this worked as intended by checking that the DETECTOR_PATH variable is now set. Do so via -

ls $DETECTOR_PATH

If we do this without sourcing thisepic.sh, we should get an error. Now, we should see a range of .xml files (which outline various detector configurations). We could also compile our own version of the detector within eic-shell. You might want to do so if you are actively iterating on the design of specific detector for example. See the GitHub page for instructions.

We can now process a simulation. Be aware that this may take some time, so to test it, try processing a small number of events first. Check the options we can provide via -

npsim -h

A typical simulation command might look something like -

npsim --compactFile $DETECTOR_PATH/epic_craterlake.xml --numberOfEvents 1000 --inputFiles input.hepmc --outputFile output.edm4hep.root

Most of the arguments are pretty self explanatory. As a quick demo, I’ll run -

npsim --compactFile $DETECTOR_PATH/epic_craterlake_5x41.xml --numberOfEvents 10 --inputFiles eic_DEMPgen_5on41_ip6_pi+_1B_1.hepmc --outputFile DEMPgen_5on41_pi+_10_TestOutput.edm4hep.root

When we run this, we’ll get lots of printouts to screen, we can supress this by adding the -v5 argument too (only errors will be printed). Once we have our simulation output, we can now reconstruct our events.

Reconstruction

We can run eicrecon pretty straightforwardly, within eicshell, try -

eicrecon -h

which should again, print out the various options we have available. An example command to run the reconstruction on a file might look like this -

eicrecon -Ppodio:output_file=eicrecon_out.root -Pjana:nevents=1000 -Pdd4hep:xml_files=epic_craterlake.xml sim_output.edm4hep.root

Again, this might take a long time. So test a small sample of events first. Following up on my simulation demo, I’ll run -

eicrecon -Ppodio:output_file=DEMPgen_5on41_pi+_10_TestReconOutput.edm4hep.root -Pjana:nevents=10 -Pdd4hep:xml_files=epic_craterlake_5x41.xml DEMPgen_5on41_pi+_10_TestOutput.edm4hep.root

eicrecon will look for the detector .xml file in $DETECTOR_PATH, so make sure the detector geometry is sourced before running eicrecon.

Combining Everything

Ok, great. We now have a file we could run our earlier analysis script on. But what if we wanted to do all of this from scratch? Well, the easiest way might be to put all of this in a shell script. So, pulling all of our commands together -

#! /bin/bash

source /opt/detector/epic-main/bin/thisepic.sh
eval npsim --compactFile $DETECTOR_PATH/epic_craterlake.xml --numberOfEvents 1000 --inputFiles input.hepmc --outputFile output.edm4hep.root
sleep 3
eval eicrecon -Ppodio:output_file=eicrecon_out.root -Pjana:nevents=1000 -Pdd4hep:xml_files=epic_craterlake.xml sim_output.edm4hep.root

exit 0

I’ve premade a version of this with the commands I ran earlier, so we can run it and see what happens.

Farming

Ok, great. We can do (almost) all of the processes we need in one command. But as we’ve seen, the processing can take a while. Realistically, we’re probably going to want to parallelise this in some way. With access to the JLab iFarm or the BNL systems (Condor). We can create and submit compute jobs for this purpose. This is getting a bit beyond the scope of this tutorial, but some things to consider -

Our job needs to either access the container, or process eic-shell within the job (more on this below)
The job itself should be as simple as possible, just exectuing a command with some arugments. Our script above is a good candidate (with some work)
As is, our script is fairly inflexible. We should probably make things like the input and output file names variables that are set based upon arguments we provide.
We need to consider the resource usage of our job carefully.
Pathing can be tricky, we need to make sure the farm/compute node picks up the correct paths such as $DETECTOR_PATH (this is a common job error).
As always, TEST first. Run a small job that runs quickly interactively, THEN submit it is a small compute job. Compare the outputs.

For our first point, one easy (and not recommended for Condor jobs!) way to do this is via an EOF line -

#! /bin/bash

cat <<EOF | eic-shell
./Basic_Bash.sh
EOF

exit 0

This just starts eic-shell and runs our earlier script. We can run this WITHOUT running eic-shell first. Note that this is a bit of a cheat to address a pathing issue. $DETECTOR_PATH will be interpreted by the script BEFORE the EOF script so our variable will be mis-set. We can get around this by running a script. Ideally, for our compute job, we should probably also explicitly set our paths to eic-shell and the bash script in some way.

With changes like this made, we could then make a quick job and submit it. This is a bit beyond the tutorials, but for some farm examples, see this job script and this job submission script I use as a template. Feel free to use these as a template for your own jobs, but please thoroughly read through and understand them before submitting a huge number of jobs. Keep the comments above in mind too.

Also a quick disclaimer, my experience in running jobs is limited to systems I know (which does not include the BNL systems). As such, I can’t advise much beyond general slurm job style questions on BNL/Condor. I’m also aware that using EOF in scripts was not encouraged in BNL jobs, see this discussion on mattermost for more.

Warnings

Finally, a major disclaimer. A lot of the time, you should NOT be starting from scratch and processing through the simulation and reconstruction yourself. There are numerous reasons -

Computing time intensive
Versioning errors/mismatch
Not as reproducible (if you find an error, people will need to try and reproduce it from your environment)

Where possible, use files from official simulation campaigns (bringing us full circle, see the first lesson for using a simulation campaign file in a script!). That being said, for testing and iterating rapidly on a design change, running small jobs yourself may be the way to go. It may also help you to understand the full process by seeing the steps involved.

Key Points

There are a few steps to go through before we get to the file we analysed previously.

Good for testing, but use simulation campaign output where possible!.

Additional Resources

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Where can I access additional resources?

Objectives

Check out the additional resources under the Extras tab!

Check the “Extras” tab at the top of the page for some additional resources. In particular, check out the Exercise Scripts link for some skeletons/templates for the scripts above to get you started. I’ll also add “completed” exercise scripts a bit later on.

The Examples Repository link goes to a repository of analysis scripts/codes currently in use. You might want to refer to this as you get started on your own project if you are struggling to get off the ground. Or, it may just serve as an additional reference point.

Key Points

Starting scripts for the exercises are available.

Completed (tm) exercise scripts will be available after the lesson.

Talk to your colleagues in your WG if you don’t know where to get started. Check the Examples Repository too!

EIC Tutorial: Analysis

Introduction

Overview

Simulation Files Organization

Access Simulation from Jefferson Lab xrootd

Streaming Files

Reminder - Download a file for the next step!

Advanced Use Case - Grabbing a whole bunch of files

Key Points

The Reconstruction Output Tree

Overview

EICrecon Output Tree Structure

Browsing the file - ROOT Tips

Which branches do I need?

The MCParticles Record

How is the tree populated?

Key Points

Analyzing the Reconstruction Output

Overview

Reading the Output Trees

Sample Analysis with ROOT TTreeReader: Track Efficiency and Resolution

Efficiency Analysis

Resolution Analysis

Sample Analysis with Python/uproot: Track Efficiency and Resolution

Efficiency Analysis

Resolution Analysis

ROOT RDataFrames

PODIO - Direct Analysis

Key Points

Full Chain Analysis

Overview

Event Generator Input Files

Applying Beam Effects - Afterburner

Simulation

Reconstruction

Combining Everything

Farming

Warnings

Key Points

Additional Resources

Overview

Key Points