Introduction
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Objectives
Define physics goal of new reconstruction algorithm
Identify what information is needed to accomplish this goal
Background
The simulation-reconstruction-analysis pipeline
The ePIC software pipeline consists of the following stages.
- Event generation
- Simulation
- Digitization (eventually)
- Reconstruction
- Analysis
This tutorial focuses exclusively on the reconstruction stage and the EICrecon software package. At the highest conceptual level, EICrecon’s inputs are simulated (and eventually, optionally digitized) detector hits, and the outputs are reconstructed tracks and particle identifications. Internally, EICrecon uses a data model and software architecture which is designed to cleanly separate dozens of different reconstruction algorithms and facilitate reconfiguring them and reusing them according to users’ needs.
Levels of user interaction with reconstruction
Depending on how a user contributes to ePIC, their level of interaction with EICrecon will vary. We identify four distinct patterns, or levels of interaction:
- Performing analyses using the reconstructed data as-is
- Modifying an algorithm’s parameters and re-running the reconstruction
- Applying an existing algorithm to a fresh context
- Adding a new algorithm
This tutorial is designed to be most helpful for users doing (3) or (4), although users at all levels will benefit from the deeper understanding of EICrecon’s architecture.
Motivating example: simple electron ID with E/p cut
One key ingredient in electron ID is the ratio of the energy deposited in the calorimeter (E) to the momentum of the particle track (p). For electrons, this should be close to 1. We want to implement a rudimentary electron ID algorithm by identifying particles that satisfy 0.9 < E/p < 1.2.
What information is required?
This simple electron ID algorithm requires three pieces of information, which will be obtained from pre-existing algorithms/factories:
- Reconstructed tracks
- Calorimeter clusters
- Matching between tracks and clusters
Matching between tracks and clusters can be obtained from truth information, or from track projections.
- In the former case, the reconstructed tracks and clusters are matched through their association to the same MC particle.
- In the latter case, a matching criteria is applied to the position of the calorimeter cluster and the projected position of the track at the calorimeter.
For the purposes of this tutorial, we will use truth matching.
The input and output objects of our factory should be stored as PODIO collections.
- Reconstructed tracks are stored as a collection of
edm4eic::ReconstructedParticle - Calorimeter clusters are stored as a collection of
edm4eic::Cluster - Associations between reconstructed tracks and MC particles are stored as a collection of
edm4eic::MCRecoParticleAssociation - Associations between calorimeter clusters and MC particles are stored as a collection of
edm4eic:MCRecoClusterParticleAssociation
The members of each of these data types can be found in edm4eic.yaml in the EDM4eic repository.
Key Points
Creating a factory
Overview
Teaching: 10 min
Exercises: 1 minQuestions
Objectives
Understand the basics of EICrecon’s plugin structure
Understand where to put new factories
Understand which factory base class to use
Understand the JOmniFactory interface
Algorithms and Factories
We make a crisp distinction between algorithms and factories.
Algorithms are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. Factories, in comparison, are classes that attach Algorithms to the JANA framework. You should write Algorithms to be independently testable, and you should write a Factory so that JANA can use the Algorithm within EICrecon. The factory layer handles issues like obtaining all of the inputs from other factories, publishing the outputs so that other factories can use them or they can be written to file, obtaining the correct parameters, making sure the Algorithm has been initialized, and making sure that the correct calibrations are loaded when the run number changes.
Here’s an example to help illustrate what goes into the Algorithm and what goes into the Factory. Consider calorimeter clustering. The clustering algorithm should be independent of any individual detector, and have a set of parameters that control its behavior and live in a plain-old-data Config object. You could copy-paste this code into a codebase that uses a completely different reconstruction framework and it would still work, as long as you were using the same datamodel (e.g. edm4hep). Each detector could have its own factory (if it calls the algorithm in a substantially different way) or they may all use the same factory (if the factories only differ in their parameter values). The parameter values themselves could be hardcoded to the factory, but we strongly prefer to set them externally using a factory generator. This gives us a cleaner separation of configuration from code, and will let us do fun things in the future such as wiring factories together from an external config file, or performing parameter studies.
The basics of EICrecon’s plugin structure
JANA plugins are a mechanism for controlling which parts of EICrecon get compiled and linked together. They give us the ability to avoid having to compile and link heavy dependencies that not everybody will be using all the time. For instance, by default EICrecon uses ACTS for tracking, but perhaps someone wants to benchmark ACTS against Genfit – we wouldn’t want to have to ship Genfit inside eic-shell all the time.
Plugins were also designed so that users could integrate their analyses directly into reconstruction while keeping them independent and optional. This pattern is heavily used in the GlueX experiment and recommended in the tutorials on JANA’s own documentation. In EICrecon, we set up separate plugins for each detector and each benchmark, but not for each analysis. We strongly recommend following the advice given in the analysis tutorials instead. The instructions for adding a new plugin are here.
Where to put new factories
The EICrecon plugins are organized as follows. Under src/detectors we have subdirectories for each individual detector, and each of them corresponds to one plugin that adds the detector’s factory generators. Benchmarks are analogous. If an algorithm/factory will only ever be used in that one context, it can live there; otherwise, and preferably, the corresponding algorithm lives under src/algorithms and the corresponding factory lives under src/factories.
Once you figure out which plugin your algorithm naturally belongs to, find its InitPlugin() method. By convention this lives in a .cc file with the same name as the plugin itself. This is where you will add your factory generator.
Which factory base class to use
There are a number of different kinds of factories available in JANA which we have used within EICrecon at different points in time. Luckily, if you are writing an Algorithm from scratch, there is only one you will need to be familiar with: JOmniFactory. However, some of the earlier ones are still around, and just in case you need to modify or reuse those, here is a quick history lesson.
JFactoryT<T> is JANA’s fundamental factory base class. However, we don’t use it in EICrecon because it has the following limitations:
-
It has difficulty with PODIO data. PODIO data needs very special handling, otherwise it will leak memory or corrupt your object associations. To address this, we developed
JFactoryPodioT, which extendsJFactoryTto support PODIO correctly. -
It can only output one collection. This might seem fine at first, but frequently we need to output “Association” collections alongside the primary output collection. To address this, we developed
JMultifactory, which supports multiple outputs, including PODIO data. -
If you want to reuse an Algorithm in a different context, you need to duplicate the JFactoryT/JPodioFactoryT/JMultifactory. Until this point, collection and parameter names were hardcoded inside individual factories. To get around this, we developed
JChainMultifactoryTso that we could create multiple instances of the same factory and assign them different collection and parameter names in a logical way. -
It requires a deeper understanding of JANA internals to use correctly. The user is allowed to perform actions inside the factory callbacks that don’t necessarily make sense. We remedied this issue by developing
JOmniFactory, which declares what it needs upfront, and JANA provides it only when it makes sense.JOmniFactorysupports all of the functionality developed for points (1), (2), and (3), and presents a simpler interface.
In summary, always use JOmniFactory if you are writing something new. The migration of all EICrecon factories to JOmniFactory (tracked in https://github.com/eic/EICrecon/issues/1176) is essentially complete, so you should not encounter the older base classes in current code.
The JOmniFactory interface
The basic idea behind an OmniFactory is to declare what you need upfront. That way, the framework can retrieve everything you need at the right time, and it can handle complex namespacing logic behind the scenes so that you can dynamically rewire and reconfigure factories.
Earlier factory base classes, such as JChainMultifactory, require users to do a lot in their callbacks. Not so with JOmniFactory, which moves most of the functionality into registered members instead, as we shall discuss later. The callbacks are still there, but are made much simple, and focus on satisfying the underlying Algorithm’s needs instead of JANA’s.
These are the callbacks you’ll need to implement:
void Configure();
void ChangeRun(int32_t run_number);
void Process(int32_t run_number, uint64_t event_number);
Configure is called once when the factory is instantiated. This is where the user should initialize the underlying Algorithm. JANA will have already fetched the services, configured the logger, and set the values of the Config struct, so all the user needs to do is pass these things to the Algorithm.
ChangeRun is called once JANA detects that a new run has been started. This is where the user should update calibration data or other resources keyed off of the run number. JOmniFactory also provides a Resource registered member to automatically retrieve data from an arbitrary Service, though this is still experimental.
Process is called for every event. (Side note: Although note that because different threads have different factory instances, any individual factory cannot be guaranteed to witness the entire event stream. If you need to have one instance that processes the entire event stream, JANA provides JEventProcessors and JEventProcessorSequential for that purpose.) JANA will have already prefetched the registered Inputs before Process is called. The user needs to execute the Algorithm using those inputs, and copy the resulting outputs back to the registered Outputs. JANA will then take care of publishing the outputs downstream.
Note that unlike earlier factory base classes, JOmniFactory uses the Curiously Recurring Template Pattern so that the callback methods aren’t virtual. This lets the optimizer get rid of any performance penalty for the extra layer of indirection.
Here is the full JOmniFactory code skeleton:
#pragma once
#include "extensions/jana/JOmniFactory.h"
class ReconstructedElectrons_factory : public JOmniFactory<ReconstructedElectrons_factory> {
private:
// Declare inputs and outputs
// PodioInput<edm4hep::MCParticle> m_in_mc_particles {this, "MCParticles"};
// PodioOutput<edm4eic::ReconstructedParticle> m_out_reco_particles {this};
// Declare parameters
// ParameterRef<double> m_min_energy_over_momentum {this, "minEnergyOverMomentum", config().min_energy_over_momentum};
// Declare services
// Service<DD4hep_service> m_geoSvc {this};
public:
void Configure() {
// This is called when the factory is instantiated.
// Use this callback to make sure the algorithm is configured.
// The logger, parameters, and services have all been fetched before this is called
}
void ChangeRun(int32_t run_number) {
// This is called whenever the run number is changed.
// Use this callback to retrieve state that is keyed off of run number.
}
void Process(int32_t run_number, uint64_t event_number) {
// This is called on every event.
// Use this callback to call your Algorithm using all inputs and outputs
// The inputs will have already been fetched for you at this point.
// m_algo->process({...}, {...});
logger()->debug( "Event {}: Calling Process()", event_number );
}
};
The JOmniFactory inputs and outputs
The user specifies the JOmniFactory’s inputs by declaring PodioInput or VariadicPodioInput objects as data members. These are templated on the basic PODIO type (Not the collection type or mutable type or object type or data type), and require the user to pass this as a constructor argument. These objects immediately register themselves with the factory, so that the factory always knows exactly what data it needs to fetch. To access the data once it has been fetched, the user can call the object’s operator(), which returns a constant pointer to a PODIO collection of the correct type. For instance, suppose the user declares the data member:
PodioInput<edm4hep::MCParticle> m_particles_in {this};
In this case, the user would access the input data like this:
const edm4hep::MCParticleCollection* particles_in = m_particles_in();
Of course, for brevity, the user could simply pass m_particles_in() straight into the algorithm, and write the algorithm output through m_particles_out().get() — which is the pattern most factories use today:
m_algo->process({m_particles_in()}, {m_particles_out().get()});
As you have just seen, PodioOutputs are very analogous to PodioInputs.
Exercise:
- Create your own ElectronReconstruction factory from the code skeleton above
- Give your OmniFactory a single output collection
- Have its Process() method produce some log output
- Experiment with giving it different input collections
Key Points
Calling a factory
Overview
Teaching: 10 min
Exercises: 1 minQuestions
Objectives
Learn how to wire a factory using a factory generator
Learn how to call the factory as a once-off
Learn how to call the factory every time
Wiring a factory using a factory generator
Instead of handing over the OmniFactory to JANA directly, we create a JOmniFactoryGeneratorT which is basically a recipe that JANA uses to create factories later. There are several reasons we need this:
-
JANA is designed to process events in parallel, when enabled. Factories are allowed to cache local state using member variables (e.g. calibrations and resources) Despite this, factories don’t need to be thread-safe, because each factory is only used by one thread at a time. This means that JANA needs to spin up at least one instance of each factory for each thread.
-
We want to be able to spin up separate instances of the same factory class (within the context of a single event and thread), and give them different parameter values and collection names. The way we manage these different instances is by giving each factory instance a unique prefix which will be used to namespace its parameters, inputs, outputs, and logger. This happens at the JFactoryGenerator level.
Here is how you set up a factory generator:
app->Add(new JOmniFactoryGeneratorT<MC2ReconstructedParticle_factory>(
"GeneratedParticles",
{"MCParticles"},
{"GeneratedParticles"},
app
));
In this example, “GeneratedParticles” is the factory instance’s unique tag, {"MCParticles"} is the list of input collection names, and {"GeneratedParticles"} is the list of output collection names. Some observations:
-
If you are only creating one instance of this factory, feel free to use the “primary” output collection name as the factory prefix. (This has to be unique because PODIO collection names have to be unique.)
-
Collection names are positional, so they need to be in the same order as the
PodioInputandVariadicPodioInputdeclarations in the factory. -
Variadic inputs are a little bit interesting: You can have any number of variadic inputs mixed in among the non-variadic inputs, as long as there are the same number of collection names for each variadic input. If this confuses you, just restrict yourself to one variadic input and put it as the very last input, like most programming languages do.
-
When assigning names to collections and
JOmniFactoryprefixes, uniqueness is extremely important! JANA will throw an error if it detects a naming collision, but because of the dynamic plugin loading, some collisions can’t be detected. DO NOT use the same output collection name or prefix in different plugins, and DO NOT rely the plugin loading order to make sure you ran the “correct” factory! To swap out different versions of a factory, change or override the input collection name on the factories and processors downstream.
Where to put factory generators
Factory generators need to be added inside an InitPlugin for a particular plugin. If the factory generator is specific to one particular detector, it would go in that detector’s plugin. Because electron reconstruction is not, the generator is set up in one of the plugins under src/global. Because this falls in the category of reconstruction, we put it in src/global/reco/reco.cc.
Calling the factory
To temporarily include your factory’s outputs in the output file
On the command line, set the podio:output_collections parameter to include your collection names:
eicrecon -Ppodio:output_collections=MyNewCollectionName1,MyNewCollectionName2 in.root
To permanently include your factory’s outputs in the output file:
Add your collection name to the output_collections list in src/services/io/podio/JEventProcessorPODIO.cc:
std::vector<std::string> output_collections = {
// Header and other metadata
"EventHeader",
// Truth record
"MCParticles",
"MCBeamElectrons",
"MCBeamProtons",
// ...
To temporarily use your factory’s outputs as inputs to another factory
eicrecon -Ptargetfactory:InputTags=MyNewCollectionName1,MyNewCollection2 in.root
To permanently use your factory’s outputs as inputs to another factory
Change the collection name in the JOmniFactoryGeneratorT:
app->Add(new JOmniFactoryGeneratorT<MC2ReconstructedParticle_factory>(
"GeneratedParticles",
{"MCParticlesSmeared"}, // <== Used to be "MCParticles"
{"GeneratedParticles"},
app
));
Exercise:
- Create a JOmniFactoryGenerator for your ElectronReconstruction factory
- Give your factory’s output collection a fun name
- Call your factory from the command line and verify that you see its logger output.
- Add it to the
JEventProcessorPODIO::output_collections, so that it gets called automatically.- Experiment with multiple factory generators so you can have multiple instances of the same factory
Key Points
Parameterizing a factory
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Objectives
Learn how to set parameters on a factory
Learn how to override factories via a generator
Learn how to override factories via the command line
Learn how to access services from a factory
Setting parameters on a factory
Parameters are also handled using registered members. JOmniFactory provides a Parameter class template which can hold its own value, but in EICrecon we prefer to use Config structs. Thus JOmniFactory provides ParameterRef, which stores a reference into the Config object.
ParameterRef<double> m_samplingFraction {this, "samplingFraction", config().sampFrac};
ParameterRef<std::string> m_energyWeight {this, "energyWeight", config().energyWeight};
Parameters are fetched immediately before Configure() is called, so you may access them from any of the callbacks like so:
void Process(int32_t run_number, uint64_t event_number) {
logger()->debug( "Event {}: samplingFraction = {}", event_number, m_samplingFraction() );
}
Because we are using ParameterRefs, we can also access the field the ref points to directly:
void Process(int32_t run_number, uint64_t event_number) {
logger()->debug( "Event {}: samplingFraction = {}", event_number, config().sampFrac );
}
Config objects
We create a plain-old struct to hold our parameters. For now this config struct can live in the same header file as our factory, although eventually it should belong with the algorithm instead.
struct ReconstructedElectrons_config {
double threshold = 0.9;
int bucket_count = 4;
// ...
}
By passing it in to the JOmniFactory base class, we can make it automatically available via the config() method.
class ReconstructedElectrons_factory : public JOmniFactory<ReconstructedElectrons_factory, ElectronReconstructionConfig> {
...
}
Overriding parameters via a generator
If you use a Config object for your parameters, you can pass it in directly to the factory generator:
app->Add(new JOmniFactoryGeneratorT<BasicTestAlg>(
"FunTest", {"MyHits"}, {"MyClusters"},
{
.threshold = 6.1,
.bucket_count = 22
},
app));
Overriding parameters via the command line
We can override parameters on the command line like so:
eicrecon -PFunTest:threshold=12.0 in.root
Accessing services from a factory
Services are singletons that provide access to resources such as loggers, geometry, magnetic field maps, etc. Services need to be thread-safe because they are shared by different threads. The most relevant service right now is DD4hep_service. You obtain a service using a registered member like this:
Service<DD4hep_service> m_geoSvc {this};
Oftentimes we want to retrieve a resource from a Service and refresh it whenever the run number changes. OmniFactory provides Resource for this purpose.
Exercise:
- Give your factory a Config struct
- Give your Config struct some parameters
- Experiment with overriding parameter values in the generator and on the command line.
Key Points
Adding an algorithm
Overview
Teaching: 5 min
Exercises: 1 minQuestions
Objectives
Understand the difference between a factory and an algorithm
Understand where to put the algorithm code
Understand the basic algorithm interface
Understand how to call an algorithm from a factory
The difference between a factory and an algorithm
Algorithms are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. The core of an Algorithm is a method called process which takes a tuple of input PODIO collections and a tuple of output PODIO collections. Algorithms don’t know or care where the inputs come from and where they go. Algorithms also don’t know much about where their parameters come from; rather, they are passed a Config structure which contains the parameters’ values. The nice thing about algorithms is that they are simple to design and test, and easy to reuse for different detectors, frameworks, or even entire experiments.
In EICrecon, all new algorithms inherit from the templated algorithms::Algorithm<Input<...>, Output<...>> interface (provided by the eic/algorithms “framework-less framework”), and use the WithPodConfig<ConfigT> mixin to attach a configuration struct. The algorithms::Algorithm base provides logging facilities (info(), debug(), trace(), …) and a structured way of declaring inputs and outputs by their PODIO collection types.
Where to put the algorithm code
All algorithms that are not specific to a single detector should go under src/algorithms. Because this falls in the category of reconstruction, we’ll put it in src/algorithms/reco.
The basic algorithm interface
Here is a template for an algorithm header file:
#pragma once
#include <algorithms/algorithm.h>
// #include relevant edm4eic / edm4hep collection headers here
#include "MyAlgorithmNameConfig.h"
#include "algorithms/interfaces/WithPodConfig.h"
namespace eicrecon {
using MyAlgorithmNameAlgorithm =
algorithms::Algorithm<algorithms::Input<MyInputCollection>,
algorithms::Output<MyOutputCollection>>;
class MyAlgorithmName : public MyAlgorithmNameAlgorithm,
public WithPodConfig<MyAlgorithmNameConfig> {
public:
MyAlgorithmName(std::string_view name)
: MyAlgorithmNameAlgorithm{name,
{"inputCollectionName"},
{"outputCollectionName"},
"Short description of what this algorithm does."} {}
// init() is called once before processing starts. Most algorithms do not need it.
void init() final {};
// process() does the actual work for each event. The Input/Output tuples
// contain pointers to the PODIO collections.
void process(const Input&, const Output&) const final;
};
} // namespace eicrecon
A few things worth noting:
- The class is templated on the list of input and output collection types. The
InputandOutputaliases inside the class expand intostd::tupleof pointers (gsl::not_null<const T*>for inputs,T*for outputs). process()isconst— algorithms must not mutate their own state during event processing. Run-by-run state should be set up ininit()instead.- Logging is inherited from
algorithms::AlgorithmBase, so insideprocess()you simply callinfo(...),debug(...), ortrace(...)directly — no logger pointer needs to be passed in. - The configuration struct is held by
WithPodConfigand accessible as the protected memberm_cfg.
The corresponding implementation file unpacks the input and output tuples with structured bindings:
#include "MyAlgorithmName.h"
namespace eicrecon {
void MyAlgorithmName::process(const Input& input, const Output& output) const {
const auto [in_particles] = input;
auto [out_particles] = output;
// ... fill out_particles using in_particles and m_cfg ...
}
} // namespace eicrecon
How to call an algorithm from a factory
The code to call an algorithm from a factory generally follows a specific pattern:
void Configure() {
// This is called when the factory is instantiated.
// Use this callback to make sure the algorithm is configured.
// The logger, parameters, and services have all been fetched before this is called
// Construct the algorithm with the factory's prefix as its name —
// this is what hooks the algorithm's logger up to the same prefix as the factory.
m_algo = std::make_unique<eicrecon::ElectronReconstruction>(GetPrefix());
// Forward the JANA log level down to the algorithm.
m_algo->level(static_cast<algorithms::LogLevel>(logger()->level()));
// Pass the config object to the algorithm.
m_algo->applyConfig(config());
// Call init() once. Note that init() takes no arguments — services
// (e.g. geometry) are accessed by the algorithms framework via the
// algorithms::ServiceSvc / algorithms::GeoSvc, not by passing pointers in.
m_algo->init();
}
void Process(int32_t /* run_number */, uint64_t /* event_number */) {
// This is called on every event. The inputs will have already been fetched.
// Call process() with brace-enclosed lists of input pointers and output pointers.
m_algo->process({m_in_particles()}, {m_out_particles().get()});
}
Exercise:
- Create your own ElectronReconstruction algorithm using the code skeleton above.
- Print some log messages from your algorithm’s
process()method usinginfo(...)/debug(...).- Have your ElectronReconstruction factory call the algorithm.
- Run this end-to-end.
Key Points
Working with PODIO
Overview
Teaching: 5 min
Exercises: 1 minQuestions
Objectives
Gain familiarity working with PODIO collections
Understand PODIO subset collections
Introduction to PODIO
Our data model is in a library/namespace/repository called edm4eic, and it is built on top of edm4hep, a data model designed to capture commonalities across HEP experiments. edm4eic is implemented using PODIO, which is a toolkit for generating the data model classes from a specification written in YAML. Here is a very simple example of a PODIO specification:
options :
# should getters / setters be prefixed with get / set?
getSyntax: False
# should POD members be exposed with getters/setters in classes that have them as members?
exposePODMembers: True
includeSubfolder: True
datatypes :
ExampleHit :
Description : "Hit"
Author : "B. Hegner"
Members:
- unsigned long long cellID // cellID
- double x // x-coordinate
- double y // y-coordinate
- double z // z-coordinate
- double energy // measured energy deposit
ExampleCluster :
Description : "Cluster"
Author : "N. Brei"
Members:
- double energy // cluster energy
OneToManyRelations:
- ExampleHit Hits // hits contained in the cluster
- ExampleCluster Clusters // sub clusters used to create this cluster
PODIO will then generate for us the following classes:
DatamodelDefinition.h ExampleCluster.h ExampleClusterCollection.h ExampleClusterCollectionData.h ExampleClusterData.h ExampleClusterObj.h ExampleHit.h ExampleHitCollection.h ExampleHitCollectionData.h ExampleHitData.h ExampleHitObj.h MutableExampleHit.h MutableExampleCluster.h
As you can see, PODIO has a lot of moving pieces. Why?
- PODIO adds a separate layer for managing memory in a way which is more consistent with Python and other garbage-collected languages. The user only has to work with values, no explicit allocations or deletions.
- PODIO separates the data’s memory layout from its accessors
- PODIO enforces immutability directly in the object model
- PODIO has sophisticated (though fragile!) mechanisms for tracking object references
These design principles in principle should eliminate entire classes of bugs. However, there are still subtleties when using PODIO that can lead to leaks, crashes, or corrupted references. Luckily, the correct usage pattern is quite simple, as we will discuss next.
Working with PODIO objects, collections, and subset collections:
auto hits = std::make_unique<ExampleHitCollection>();
hits->push_back(ExampleHit(22, 0.0, 0.0, 0.0, 0.001));
MutableExampleHit hit;
hit.x(0.0);
hit.energy(0.001);
// ...
hits->push_back(hit);
MutableExampleCluster cluster;
cluster.addHits(hit);
// Safety tip: Add object to a collection BEFORE creating an association to it
auto clusters = std::make_unique<ExampleClusterCollection>();
clusters->push_back(cluster);
auto subset_clusters = std::make_unique<ExampleClusterCollection>();
subset_clusters->setSubsetCollection(true);
subset_clusters->push_back(cluster);
// Safety tip: Every PODIO object is owned by exactly one collection.
// If you want to put the object in other collections, those collections need to
// be designated as "subset collections", which means that they don't own their contents.
Note that when you write a factory, its inputs will be const ExampleHitCollection*, which are immutable.
Its output is held by a PodioOutput<ExampleHit> member; calling m_output() returns a std::unique_ptr<ExampleHitCollection>& that the factory can mutate, and m_output().get() gives a raw mutable pointer that you can hand to your algorithm’s process() method. After Process() returns, JOmniFactory transfers ownership of that collection to JANA2, which adds it to a podio Frame. From that point on, the collection is immutable and owned by the Frame.
JANA2 will create and destroy Frames internally.
Exercise:
- Have your algorithm produce some (fake) output data!
Key Points
Putting everything together
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Objectives
Get the electron finder running end-to-end
The final ReconstructedElectron factory
Here is the final ReconstructedElectrons_factory. The current implementation in main is much simpler than the variadic-input version that was originally drafted: by the time we reach the reco stage, charged tracks have already been combined with calorimeter clusters into edm4eic::ReconstructedParticle objects (each carrying a getClusters() relation), so we only need to read that one collection and apply the E/p cut on it.
src/factories/reco/ReconstructedElectrons_factory.h:
#pragma once
#include "extensions/jana/JOmniFactory.h"
#include "algorithms/reco/ElectronReconstruction.h"
namespace eicrecon {
class ReconstructedElectrons_factory
: public JOmniFactory<ReconstructedElectrons_factory, ElectronReconstructionConfig> {
public:
using AlgoT = eicrecon::ElectronReconstruction;
private:
// Underlying algorithm
std::unique_ptr<AlgoT> m_algo;
// Declare inputs
PodioInput<edm4eic::ReconstructedParticle> m_in_rc_particles{this, "ReconstructedParticles"};
// Declare outputs
PodioOutput<edm4eic::ReconstructedParticle> m_out_reco_particles{this};
// Declare parameters
ParameterRef<double> m_min_energy_over_momentum{this, "minEnergyOverMomentum",
config().min_energy_over_momentum};
ParameterRef<double> m_max_energy_over_momentum{this, "maxEnergyOverMomentum",
config().max_energy_over_momentum};
public:
void Configure() {
// This is called when the factory is instantiated.
// Use this callback to make sure the algorithm is configured.
// The logger, parameters, and services have all been fetched before this is called
m_algo = std::make_unique<AlgoT>(GetPrefix());
m_algo->level(static_cast<algorithms::LogLevel>(logger()->level()));
// Pass config object to algorithm
m_algo->applyConfig(config());
m_algo->init();
}
void Process(int32_t /* run_number */, uint64_t /* event_number */) {
// This is called on every event.
// Use this callback to call your Algorithm using all inputs and outputs.
// The inputs will have already been fetched for you at this point.
m_algo->process({m_in_rc_particles()}, {m_out_reco_particles().get()});
logger()->debug("Found {} reconstructed electron candidates", m_out_reco_particles()->size());
}
};
} // namespace eicrecon
Note that ChangeRun() is omitted entirely — the JOmniFactory base class provides a default no-op implementation, so a factory that doesn’t need to react to run-number changes does not have to override it.
Next, we register this with the reco plugin in src/global/reco/reco.cc:
app->Add(new JOmniFactoryGeneratorT<ReconstructedElectrons_factory>(
"ReconstructedElectrons", {"ReconstructedParticles"}, {"ReconstructedElectrons"}, {}, app));
You can also create additional instances of the same factory with different parameter values. For example, the same reco.cc registers a second instance, ReconstructedElectronsForDIS, with a wider E/p window:
app->Add(new JOmniFactoryGeneratorT<ReconstructedElectrons_factory>(
"ReconstructedElectronsForDIS", {"ReconstructedParticles"}, {"ReconstructedElectronsForDIS"},
{
.min_energy_over_momentum = 0.7, // GeV
.max_energy_over_momentum = 1.3 // GeV
},
app));
And finally, we add its output collection name to the output include list in src/services/io/podio/JEventProcessorPODIO.cc:
"ReconstructedElectrons",
The ElectronReconstruction algorithm
src/algorithms/reco/ElectronReconstruction.h:
#pragma once
#include <algorithms/algorithm.h>
#include <edm4eic/ReconstructedParticleCollection.h>
#include <string>
#include <string_view>
#include "ElectronReconstructionConfig.h"
#include "algorithms/interfaces/WithPodConfig.h"
namespace eicrecon {
using ElectronReconstructionAlgorithm =
algorithms::Algorithm<algorithms::Input<edm4eic::ReconstructedParticleCollection>,
algorithms::Output<edm4eic::ReconstructedParticleCollection>>;
class ElectronReconstruction : public ElectronReconstructionAlgorithm,
public WithPodConfig<ElectronReconstructionConfig> {
public:
ElectronReconstruction(std::string_view name)
: ElectronReconstructionAlgorithm{name,
{"inputParticles"},
{"outputParticles"},
"selected electrons from reconstructed particles"} {}
void init() final {};
void process(const Input&, const Output&) const final;
};
} // namespace eicrecon
A few things to note about the modern algorithm interface:
- The algorithm inherits from
algorithms::Algorithm<Input<...>, Output<...>>, so its inputs and outputs are declared as types in the template parameter list. The compiler then checks at the call site that the right collection types are passed in. WithPodConfig<ElectronReconstructionConfig>provides the protected memberm_cfg, which is what the algorithm uses to access its configuration values.- Logging methods (
info,debug,trace,warning,error) are inherited — there is nom_logmember to manage by hand. process()isconst. Any state that has to be set up once per job goes ininit()(which here is empty); state that depends on the run number can be retrieved on the fly fromalgorithms::GeoSvcand friends.
The algorithm’s parameters live in a Config struct at src/algorithms/reco/ElectronReconstructionConfig.h:
namespace eicrecon {
struct ElectronReconstructionConfig {
double min_energy_over_momentum = 0.9;
double max_energy_over_momentum = 1.2;
};
} // namespace eicrecon
The algorithm itself lives at src/algorithms/reco/ElectronReconstruction.cc:
#include "ElectronReconstruction.h"
#include <edm4eic/ClusterCollection.h>
#include <edm4eic/ReconstructedParticleCollection.h>
#include <edm4hep/utils/vector_utils.h>
#include <fmt/core.h>
#include <podio/RelationRange.h>
#include <gsl/pointers>
#include "algorithms/reco/ElectronReconstructionConfig.h"
namespace eicrecon {
void ElectronReconstruction::process(const Input& input, const Output& output) const {
const auto [rcparts] = input;
auto [out_electrons] = output;
// out_electrons is a *subset* collection of the input ReconstructedParticles —
// PODIO objects are owned by exactly one collection, and a subset collection
// lets us refer to existing objects without copying or transferring ownership.
out_electrons->setSubsetCollection();
for (const auto particle : *rcparts) {
// Skip particles without an associated cluster (no calorimeter info)
if (particle.getClusters().empty()) {
continue;
}
// Skip neutral particles (no track momentum to compare against)
if (particle.getCharge() == 0) {
continue;
}
double E = particle.getClusters()[0].getEnergy();
double p = edm4hep::utils::magnitude(particle.getMomentum());
double EOverP = E / p;
trace("ReconstructedElectron: Energy={} GeV, p={} GeV, E/p = {} for PDG (from truth): {}",
E, p, EOverP, particle.getPDG());
// Apply the E/p cut here to select electrons
if (EOverP >= m_cfg.min_energy_over_momentum &&
EOverP <= m_cfg.max_energy_over_momentum) {
out_electrons->push_back(particle);
}
}
debug("Found {} electron candidates", out_electrons->size());
}
} // namespace eicrecon
Compared to the truth-association-based draft from earlier in the tutorial, this version is much shorter because:
- The track ↔ cluster matching has already been done upstream and is exposed via
ReconstructedParticle::getClusters(). - We don’t need any
MCRecoParticleAssociationorMCRecoClusterParticleAssociationcollections — the only input is the mergedReconstructedParticles. - We don’t allocate new particles; we just keep references to the originals via a subset collection (
setSubsetCollection()).
The exercise from earlier episodes — wiring a custom variant that takes truth associations as additional inputs — is still a useful one. Compare your version with the version on main to see the trade-offs between richness of inputs and code simplicity.
Key Points