This lesson is still being designed and assembled (Pre-Alpha version)

Reconstruction Algorithms

Introduction

Overview

Teaching: 5 min
Exercises: 0 min
Questions
Objectives
  • Define physics goal of new reconstruction algorithm

  • Identify what information is needed to accomplish this goal

Background

The simulation-reconstruction-analysis pipeline

The ePIC software pipeline consists of the following stages.

  1. Event generation
  2. Simulation
  3. Digitization (eventually)
  4. Reconstruction
  5. Analysis

This tutorial focuses exclusively on the reconstruction stage and the EICrecon software package. At the highest conceptual level, EICrecon’s inputs are simulated (and eventually, optionally digitized) detector hits, and the outputs are reconstructed tracks and particle identifications. Internally, EICrecon uses a data model and software architecture which is designed to cleanly separate dozens of different reconstruction algorithms and facilitate reconfiguring them and reusing them according to users’ needs.

Levels of user interaction with reconstruction

Depending on how a user contributes to ePIC, their level of interaction with EICrecon will vary. We identify four distinct patterns, or levels of interaction:

  1. Performing analyses using the reconstructed data as-is
  2. Modifying an algorithm’s parameters and re-running the reconstruction
  3. Applying an existing algorithm to a fresh context
  4. Adding a new algorithm

This tutorial is designed to be most helpful for users doing (3) or (4), although users at all levels will benefit from the deeper understanding of EICrecon’s architecture.

Motivating example: simple electron ID with E/p cut

One key ingredient in electron ID is the ratio of the energy deposited in the calorimeter (E) to the momentum of the particle track (p). For electrons, this should be close to 1. We want to implement a rudimentary electron ID algorithm by identifying particles that satisfy 0.9 < E/p < 1.2.

What information is required?

This simple electron ID algorithm requires three pieces of information, which will be obtained from pre-existing algorithms/factories:

Matching between tracks and clusters can be obtained from truth information, or from track projections.

The input and output objects of our factory should be stored as PODIO collections.

The members of each of these data types can be found in edm4eic.yaml in the EDM4eic repository.

Key Points


Creating a factory

Overview

Teaching: 10 min
Exercises: 1 min
Questions
Objectives
  • Understand the basics of EICrecon’s plugin structure

  • Understand where to put new factories

  • Understand which factory base class to use

  • Understand the JOmniFactory interface

Algorithms and Factories

We make a crisp distinction between algorithms and factories.

Algorithms are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. Factories, in comparison, are classes that attach Algorithms to the JANA framework. You should write Algorithms to be independently testable, and you should write a Factory so that JANA can use the Algorithm within EICrecon. The factory layer handles issues like obtaining all of the inputs from other factories, publishing the outputs so that other factories can use them or they can be written to file, obtaining the correct parameters, making sure the Algorithm has been initialized, and making sure that the correct calibrations are loaded when the run number changes.

Here’s an example to help illustrate what goes into the Algorithm and what goes into the Factory. Consider calorimeter clustering. The clustering algorithm should be independent of any individual detector, and have a set of parameters that control its behavior and live in a plain-old-data Config object. You could copy-paste this code into a codebase that uses a completely different reconstruction framework and it would still work, as long as you were using the same datamodel (e.g. edm4hep). Each detector could have its own factory (if it calls the algorithm in a substantially different way) or they may all use the same factory (if the factories only differ in their parameter values). The parameter values themselves could be hardcoded to the factory, but we strongly prefer to set them externally using a factory generator. This gives us a cleaner separation of configuration from code, and will let us do fun things in the future such as wiring factories together from an external config file, or performing parameter studies.

The basics of EICrecon’s plugin structure

JANA plugins are a mechanism for controlling which parts of EICrecon get compiled and linked together. They give us the ability to avoid having to compile and link heavy dependencies that not everybody will be using all the time. For instance, by default EICrecon uses ACTS for tracking, but perhaps someone wants to benchmark ACTS against Genfit – we wouldn’t want to have to ship Genfit inside eic-shell all the time.

Plugins were also designed so that users could integrate their analyses directly into reconstruction while keeping them independent and optional. This pattern is heavily used in the GlueX experiment and recommended in the tutorials on JANA’s own documentation. In EICrecon, we set up separate plugins for each detector and each benchmark, but not for each analysis. We strongly recommend following the advice given in the analysis tutorials instead. The instructions for adding a new plugin are here.

Where to put new factories

The EICrecon plugins are organized as follows. Under src/detectors we have subdirectories for each individual detector, and each of them corresponds to one plugin that adds the detector’s factory generators. Benchmarks are analogous. If an algorithm/factory will only ever be used in that one context, it can live there; otherwise, and preferably, the corresponding algorithm lives under src/algorithms and the corresponding factory lives under src/factories.

Once you figure out which plugin your algorithm naturally belongs to, find its InitPlugin() method. By convention this lives in a .cc file with the same name as the plugin itself. This is where you will add your factory generator.

Which factory base class to use

There are a number of different kinds of factories available in JANA which we have used within EICrecon at different points in time. Luckily, if you are writing an Algorithm from scratch, there is only one you will need to be familiar with: JOmniFactory. However, some of the earlier ones are still around, and just in case you need to modify or reuse those, here is a quick history lesson.

JFactoryT<T> is JANA’s fundamental factory base class. However, we don’t use it in EICrecon because it has the following limitations:

  1. It has difficulty with PODIO data. PODIO data needs very special handling, otherwise it will leak memory or corrupt your object associations. To address this, we developed JFactoryPodioT, which extends JFactoryT to support PODIO correctly.

  2. It can only output one collection. This might seem fine at first, but frequently we need to output “Association” collections alongside the primary output collection. To address this, we developed JMultifactory, which supports multiple outputs, including PODIO data.

  3. If you want to reuse an Algorithm in a different context, you need to duplicate the JFactoryT/JPodioFactoryT/JMultifactory. Until this point, collection and parameter names were hardcoded inside individual factories. To get around this, we developed JChainMultifactoryT so that we could create multiple instances of the same factory and assign them different collection and parameter names in a logical way.

  4. It requires a deeper understanding of JANA internals to use correctly. The user is allowed to perform actions inside the factory callbacks that don’t necessarily make sense. We remedied this issue by developing JOmniFactory, which declares what it needs upfront, and JANA provides it only when it makes sense. JOmniFactory supports all of the functionality developed for points (1), (2), and (3), and presents a simpler interface.

In summary, always use JOmniFactory if you are writing something new. All existing factories in EICrecon are in the process of being migrated right now: https://github.com/eic/EICrecon/issues/1176.

The JOmniFactory interface

The basic idea behind an OmniFactory is to declare what you need upfront. That way, the framework can retrieve everything you need at the right time, and it can handle complex namespacing logic behind the scenes so that you can dynamically rewire and reconfigure factories.

Earlier factory base classes, such as JChainMultifactory, require users to do a lot in their callbacks. Not so with JOmniFactory, which moves most of the functionality into registered members instead, as we shall discuss later. The callbacks are still there, but are made much simple, and focus on satisfying the underlying Algorithm’s needs instead of JANA’s.

These are the callbacks you’ll need to implement:

    void Configure();
    void ChangeRun(int64_t run_number);
    void Process(int64_t run_number, uint64_t event_number);

Configure is called once when the factory is instantiated. This is where the user should initialize the underlying Algorithm. JANA will have already fetched the services, configured the logger, and set the values of the Config struct, so all the user needs to do is pass these things to the Algorithm.

ChangeRun is called once JANA detects that a new run has been started. This is where the user should update calibration data or other resources keyed off of the run number. JOmniFactory also provides a Resource registered member to automatically retrieve data from an arbitrary Service, though this is still experimental.

Process is called for every event. (Side note: Although note that because different threads have different factory instances, any individual factory cannot be guaranteed to witness the entire event stream. If you need to have one instance that processes the entire event stream, JANA provides JEventProcessors and JEventProcessorSequential for that purpose.) JANA will have already prefetched the registered Inputs before Process is called. The user needs to execute the Algorithm using those inputs, and copy the resulting outputs back to the registered Outputs. JANA will then take care of publishing the outputs downstream.

Note that unlike earlier factory base classes, JOmniFactory uses the Curiously Recurring Template Pattern so that the callback methods aren’t virtual. This lets the optimizer get rid of any performance penalty for the extra layer of indirection.

Here is the full JOmniFactory code skeleton:


#pragma once
#include "extensions/jana/JOmniFactory.h"

class ReconstructedElectrons_factory : public JOmniFactory<ReconstructedElectrons_factory> {
private:

    // Declare inputs and outputs
    // PodioInput<edm4hep::MCParticle> m_in_mc_particles {this, "MCParticles"};
    // PodioOutput<edm4eic::ReconstructedParticle> m_out_reco_particles {this};

    // Declare parameters
    // ParameterRef<double> m_min_energy_over_momentum {this, "minEnergyOverMomentum", config().min_energy_over_momentum};
 
    // Declare services
    // Service<DD4hep_service> m_geoSvc {this};

public:
    void Configure() {
        // This is called when the factory is instantiated.
        // Use this callback to make sure the algorithm is configured.
        // The logger, parameters, and services have all been fetched before this is called
    }

    void ChangeRun(int64_t run_number) {
        // This is called whenever the run number is changed.
        // Use this callback to retrieve state that is keyed off of run number.
    }

    void Process(int64_t run_number, uint64_t event_number) {
        // This is called on every event.
        // Use this callback to call your Algorithm using all inputs and outputs
        // The inputs will have already been fetched for you at this point.
        // m_algo->execute(...);

        logger()->debug( "Event {}: Calling Process()", event_number );
    }
};

The JOmniFactory inputs and outputs

The user specifies the JOmniFactory’s inputs by declaring PodioInput or VariationalPodioInput objects as data members. These are templated on the basic PODIO type (Not the collection type or mutable type or object type or data type), and require the user to pass this as a constructor argument. These objects immediately register themselves with the factory, so that the factory always knows exactly what data it needs to fetch. To access the data once it has been fetched, the user can call the object’s operator(), which returns a constant pointer to a PODIO collection of the correct type. For instance, suppose the user declares the data member:

PodioInput<MCParticles> m_particles_in {this};

In this case, the user would access the input data like this:

const MCParticlesCollection* particles_in = m_particles_in();

Of course, for brevity, the user could simply write this instead:

m_particles_out() = smearing_algo->execute( m_particles_in() );

As you have just seen, PodioOutputs are very analogous to PodioInputs.

Exercise:

  • Create your own ElectronReconstruction factory from the code skeleton above
  • Give your OmniFactory a single output collection
  • Have its Process() method produce some log output
  • Experiment with giving it different input collections

Key Points


Calling a factory

Overview

Teaching: 10 min
Exercises: 1 min
Questions
Objectives
  • Learn how to wire a factory using a factory generator

  • Learn how to call the factory as a once-off

  • Learn how to call the factory every time

Wiring a factory using a factory generator

Instead of handing over the OmniFactory to JANA directly, we create a JOmniFactoryGeneratorT which is basically a recipe that JANA uses to create factories later. There are several reasons we need this:

  1. JANA is designed to process events in parallel, when enabled. Factories are allowed to cache local state using member variables (e.g. calibrations and resources) Despite this, factories don’t need to be thread-safe, because each factory is only used by one thread at a time. This means that JANA needs to spin up at least one instance of each factory for each thread.

  2. We want to be able to spin up separate instances of the same factory class (within the context of a single event and thread), and give them different parameter values and collection names. The way we manage these different instances is by giving each factory instance a unique prefix which will be used to namespace its parameters, inputs, outputs, and logger. This happens at the JFactoryGenerator level.

Here is how you set up a factory generator:

    app->Add(new JOmniFactoryGeneratorT<MC2SmearedParticle_factory>(
            "GeneratedParticles",
            {"MCParticles"},
            {"GeneratedParticles"},
            app
            ));

In this example, “GeneratedParticles” is the factory instance’s unique tag, {"MCParticles"} is the list of input collection names, and {"GeneratedParticles"} is the list of output collection names. Some observations:

Where to put factory generators

Factory generators need to be added inside an InitPlugin for a particular plugin. If the factory generator is specific to one particular detector, it would go in that detector’s plugin. Because electron reconstruction is not, the generator is set up in one of the plugins under src/global. Because this falls in the category of reconstruction, we put it in src/global/reco/reco.cc.

Calling the factory

To temporarily include your factory’s outputs in the output file

On the command line, set the podio:output_include_collections parameter to include your collection names:

eicrecon -Ppodio:output_include_collections=MyNewCollectionName1,MyNewCollectionName2 in.root

To permanently include your factory’s outputs in the output file:

Add your collection name to the output_include_collections list in src/services/io/podio/JEventProcessorPODIO.cc:44

    std::vector<std::string> output_include_collections={
            "EventHeader",
            "MCParticles",
            "CentralTrackingRecHits",
            "CentralTrackSeedingResults",
            "CentralTrackerMeasurements",
            //...

To temporarily use your factory’s outputs as inputs to another factory

eicrecon -Ptargetfactory:InputTags=MyNewCollectionName1,MyNewCollection2 in.root

To permanently use your factory’s outputs as inputs to another factory

Change the collection name in the OmniFactoryGeneratorT or JChainMultifactoryGeneratorT:

    app->Add(new JOmniFactoryGeneratorT<MC2SmearedParticle_factory>(
            "GeneratedParticles",
            {"MCParticlesSmeared"},            // <== Used to be "MCParticles"
            {"GeneratedParticles"},
            app
            ));

Exercise:

  • Create a JOmniFactoryGenerator for your ElectronReconstruction factory
  • Give your factory’s output collection a fun name
  • Call your factory from the command line and verify that you see its logger output.
  • Add it to the JEventSourcePODIO::output_include_collections, so that it gets called automatically.
  • Experiment with multiple factory generators so you can have multiple instances of the same factory

Key Points


Parameterizing a factory

Overview

Teaching: 5 min
Exercises: 0 min
Questions
Objectives
  • Learn how to set parameters on a factory

  • Learn how to override factories via a generator

  • Learn how to override factories via the command line

  • Learn how to access services from a factory

Setting parameters on a factory

Parameters are also handled using registered members. JOmniFactory provides a Parameter class template which can hold its own value, but in EICrecon we prefer to use Config structs. Thus JOmniFactory provides ParameterRef, which stores a reference into the Config object.

    ParameterRef<double> m_samplingFraction {this, "samplingFraction", config().sampFrac};
    ParameterRef<std::string> m_energyWeight {this, "energyWeight", config().energyWeight};

Parameters are fetched immediately before Init() is called, so you may access them from any of the callbacks like so:

    void Process(int64_t run_number, uint64_t event_number) {
        logger()->debug( "Event {}: samplingFraction = {}", event_number, m_samplingFraction() );
    }

Because we are using ParameterRefs, we can also access the field the ref points to directly:

    void Process(int64_t run_number, uint64_t event_number) {
        logger()->debug( "Event {}: samplingFraction = {}", event_number, config().sampFrac );
    }

Config objects

We create a plain-old struct to hold our parameters. For now this config struct can live in the same header file as our factory, although eventually it should belong with the algorithm instead.

struct ReconstructedElectrons_config {
    double threshold = 0.9;
    int bucket_count = 4;
    // ...
}

By passing it in to the JOmniFactory base class, we can make it automatically available via the config() method.

class ReconstructedElectrons_factory : public JOmniFactory<ReconstructedElectrons_factory, ElectronReconstructionConfig> {
    ...
}

Overriding parameters via a generator

If you use a Config object for your parameters, you can pass it in directly to the factory generator:

    app->Add(new JOmniFactoryGeneratorT<BasicTestAlg>(
        "FunTest", {"MyHits"}, {"MyClusters"}, 
        {
          .threshold = 6.1,
          .bucket_count = 22
        },
        app));

Overriding parameters via the command line

We can override parameters on the command line like so:

eicrecon -PFunTest:threshold=12.0 in.root

Accessing services from a factory

Services are singletons that provide access to resources such as loggers, geometry, magnetic field maps, etc. Services need to be thread-safe because they are shared by different threads. The most relevant service right now is DD4hep_service. You obtain a service using a registered member like this:

    Service<DD4hep_service> m_geoSvc {this};

Oftentimes we want to retrieve a resource from a Service and refresh it whenever the run number changes. OmniFactory provides Resource for this purpose.

Exercise:

  • Give your factory a Config struct
  • Give your Config struct some parameters
  • Experiment with overriding parameter values in the generator and on the command line.

Key Points


Adding an algorithm

Overview

Teaching: 5 min
Exercises: 1 min
Questions
Objectives
  • Understand the difference between a factory and an algorithm

  • Understand where to put the algorithm code

  • Understand the basic algorithm interface

  • Understand how to call an algorithm from a factory

The difference between a factory and an algorithm

Algorithms are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. The core of an Algorithm is a method called execute which inputs some PODIO collections and outputs some other PODIO collections. Algorithms don’t know or care where the inputs come from and where they go. Algorithms also don’t know much about where their parameters come from; rather, they are passed a Config structure which contains the parameters’ values. The nice thing about algorithms is that they are simple to design and test, and easy to reuse for different detectors, frameworks, or even entire experiments.

Most of what makes an Algorithm an Algorithm is convention. (These are largely inspired by the KISS principle in software engineering!) There is an ongoing effort to create a “framework-less framework” for formally expressing Algorithms using templates, which lives at https://github.com/eic/algorithms. Eventually, we may encourage users to have all Algorithms inherit from the Algorithm<Input<...>, Output<...>> templated interface. For now, however, just follow the Algorithm conventions that we will go over next.

Where to put the algorithm code

All algorithms that are not specific to a single detector should go under src/algorithms. Because this falls in the category of reconstruction, we’ll put it in src/algorithms/reco.

The basic algorithm interface

Here is a template for an algorithm header file:


#pragma once

// #include relevant header files here

namespace eicrecon {

    class MyAlgorithmName {

    public:
            
        // init function contains any required initialization
        void init();

        // execute function contains main algorithm processes
        // (e.g. manipulate existing objects to create new objects)
        std::unique_ptr<MyReturnDataType> execute();
        
        // Any additional public members go here 

    private:
        std::shared_ptr<spdlog::logger> m_log;
        // any additional private members (e.g. services and calibrations) go here

    };
} // namespace eicrecon

How to call an algorithm from a factory

The code to call an algorithm from a factory generally follows a specific pattern:

    void Configure() {
        // This is called when the factory is instantiated.
        // Use this callback to make sure the algorithm is configured.
        // The logger, parameters, and services have all been fetched before this is called
        m_algo = std::make_unique<eicrecon::ElectronReconstruction>();

        // Pass config object to algorithm
        m_algo->applyConfig(config());

        // If we needed geometry, we'd obtain it like so
        // m_algo->init(m_geoSvc().detector(), m_geoSvc().converter(), logger());

        m_algo->init(logger());
    }

    void Process(int64_t run_number, uint64_t event_number) {
        // This is called on every event.
        // Use this callback to call your Algorithm using all inputs and outputs
        // The inputs will have already been fetched for you at this point.
        auto output = m_algo->execute(
          m_in_mc_particles(),
          m_in_rc_particles(),
          m_in_rc_particles_assoc(),
          m_in_clu_assoc()
        );

        m_out_reco_particles() = std::move(output);
        // JANA will take care of publishing the outputs for you.
    }

Exercise:

  • Create your own ElectronReconstruction algorithm using the code skeleton above.
  • Print some log messages from your algorithm’s execute() method.
  • Have your ElectronReconstruction factory call the algorithm.
  • Run this end-to-end.

Key Points


Working with PODIO

Overview

Teaching: 5 min
Exercises: 1 min
Questions
Objectives
  • Gain familiarity working with PODIO collections

  • Understand PODIO subset collections

Introduction to PODIO

Our data model is in a library/namespace/repository called edm4eic, and it is built on top of edm4hep, a data model designed to capture commonalities across HEP experiments. edm4eic is implemented using PODIO, which is a toolkit for generating the data model classes from a specification written in YAML. Here is a very simple example of a PODIO specification:

options :
  # should getters / setters be prefixed with get / set?
  getSyntax: False
  # should POD members be exposed with getters/setters in classes that have them as members?
  exposePODMembers: True
  includeSubfolder: True

datatypes :
  ExampleHit :
    Description : "Hit"
    Author : "B. Hegner"
    Members:
      - unsigned long long cellID      // cellID
      - double x      // x-coordinate
      - double y      // y-coordinate
      - double z      // z-coordinate
      - double energy // measured energy deposit

  ExampleCluster :
    Description : "Cluster"
    Author : "N. Brei"
    Members:
      - double energy // cluster energy
    OneToManyRelations:
      - ExampleHit Hits // hits contained in the cluster
      - ExampleCluster Clusters // sub clusters used to create this cluster

PODIO will then generate for us the following classes:

DatamodelDefinition.h ExampleCluster.h ExampleClusterCollection.h ExampleClusterCollectionData.h ExampleClusterData.h ExampleClusterObj.h ExampleHit.h ExampleHitCollection.h ExampleHitCollectionData.h ExampleHitData.h ExampleHitObj.h MutableExampleHit.h MutableExampleCluster.h

As you can see, PODIO has a lot of moving pieces. Why?

  1. PODIO adds a separate layer for managing memory in a way which is more consistent with Python and other garbage-collected languages. The user only has to work with values, no explicit allocations or deletions.
  2. PODIO separates the data’s memory layout from its accessors
  3. PODIO enforces immutability directly in the object model
  4. PODIO has sophisticated (though fragile!) mechanisms for tracking object references

These design principles in principle should eliminate entire classes of bugs. However, there are still subtleties when using PODIO that can lead to leaks, crashes, or corrupted references. Luckily, the correct usage pattern is quite simple, as we will discuss next.

Working with PODIO objects, collections, and subset collections:

auto hits = std::make_unique<ExampleHitCollection>();

hits->push_back(ExampleHit(22, 0.0, 0.0, 0.0, 0.001));

MutableExampleHit hit;
hit.x(0.0);
hit.energy(0.001);
// ...
hits->push_back(hit);

MutableExampleCluster cluster;
cluster.addHits(hit);

// Safety tip: Add object to a collection BEFORE creating an association to it

auto clusters = std::make_unique<ExampleClusterCollection>();
clusters->push_back(cluster);

auto subset_clusters = std::make_unique<ExampleClusterCollection>();
subset_clusters->setSubsetCollection(true);
subset_clusters->push_back(cluster);

// Safety tip: Every PODIO object is owned by exactly one collection.
// If you want to put the object in other collections, those collections need to 
// be designated as "subset collections", which means that they don't own their contents. 

Note that when you write a factory, its inputs will be const ExampleHitCollection*, which are immmutable. Its output will be std::unique_ptr<ExampleHitCollection>, which is still mutable but will transfer its ownership to JANA2. JANA2 will add the collection to a podio Frame. From that point on, the collection is immutable and owned by the Frame.

JANA2 will create and destroy Frames internally.

Exercise:

  • Have your algorithm produce some (fake) output data!

Key Points


Putting everything together

Overview

Teaching: 5 min
Exercises: 0 min
Questions
Objectives
  • Get the electron finder running end-to-end

The final ReconstructedElectron factory

Here is the final ReconstructedElectron factory. Our algorithm requires reconstructed tracks, calorimeter clusters, and associations between the two. As previously stated, the current implementation of the simple electron ID uses the truth association between tracks and clusters (association using matching between clusters and track projections will be implemented later). Thus, we need two sets of associations: association between the truth particle and the reconstructed charged particle, and association between the truth particle and the calorimeter cluster. Obviously, we will not create these objects from scratch. Rather, we will get them from the factories (and underlying algorithms) implemented to create these objects.

src/global/reco/ReconstructedElectrons_factory.h in branch nbrei_variadic_omnifactories:


#pragma once

#include "extensions/jana/JOmniFactory.h"

#include "algorithms/reco/ElectronReconstruction.h"


namespace eicrecon {

class ReconstructedElectrons_factory : public JOmniFactory<ReconstructedElectrons_factory, ElectronReconstructionConfig> {
private:

    // Underlying algorithm
    std::unique_ptr<eicrecon::ElectronReconstruction> m_algo;

    // Declare inputs
    PodioInput<edm4hep::MCParticle> m_in_mc_particles {this, "MCParticles"};
    PodioInput<edm4eic::ReconstructedParticle> m_in_rc_particles {this, "ReconstructedChargedParticles"};
    PodioInput<edm4eic::MCRecoParticleAssociation> m_in_rc_particles_assoc {this, "ReconstructedChargedParticleAssociations"};

    VariadicPodioInput<edm4eic::MCRecoClusterParticleAssociation> m_in_clu_assoc {this};

    // Declare outputs
    PodioOutput<edm4eic::ReconstructedParticle> m_out_reco_particles {this};

    // Declare parameters
    ParameterRef<double> m_min_energy_over_momentum {this, "minEnergyOverMomentum", config().min_energy_over_momentum};
    ParameterRef<double> m_max_energy_over_momentum {this, "maxEnergyOverMomentum", config().max_energy_over_momentum};

    // Declare services here, e.g.
    // Service<DD4hep_service> m_geoSvc {this};

public:
    void Configure() {
        // This is called when the factory is instantiated.
        // Use this callback to make sure the algorithm is configured.
        // The logger, parameters, and services have all been fetched before this is called
        m_algo = std::make_unique<eicrecon::ElectronReconstruction>();

        // Pass config object to algorithm
        m_algo->applyConfig(config());

        // If we needed geometry, we'd obtain it like so
        // m_algo->init(m_geoSvc().detector(), m_geoSvc().converter(), logger());

        m_algo->init(logger());
    }

    void ChangeRun(int64_t run_number) {
        // This is called whenever the run number is changed.
        // Use this callback to retrieve state that is keyed off of run number.
        // This state should usually be managed by a Service.
        // Note: You usually don't need this, because you can declare a Resource instead.
    }

    void Process(int64_t run_number, uint64_t event_number) {
        // This is called on every event.
        // Use this callback to call your Algorithm using all inputs and outputs
        // The inputs will have already been fetched for you at this point.
        auto output = m_algo->execute(
          m_in_mc_particles(),
          m_in_rc_particles(),
          m_in_rc_particles_assoc(),
          m_in_clu_assoc()
        );

        logger()->debug( "Event {}: Found {} reconstructed electron candidates", event_number, output->size() );

        m_out_reco_particles() = std::move(output);
        // JANA will take care of publishing the outputs for you.
    }
};
} // namespace eicrecon

Next, we register this with the reco plugin in src/global/reco/reco.cc:

    app->Add(new JOmniFactoryGeneratorT<ReconstructedElectrons_factory>(
        "ReconstructedElectrons",
        {"MCParticles", "ReconstructedChargedParticles", "ReconstructedChargedParticleAssociations",
        "EcalBarrelScFiClusterAssociations",
        "EcalEndcapNClusterAssociations",
        "EcalEndcapPClusterAssociations",
        "EcalEndcapPInsertClusterAssociations",
        "EcalLumiSpecClusterAssociations",
        },
        {"ReconstructedElectrons"},
        {},  // Override config values here
        app
    ));

And finally, we add its output collection name to the output include list in src/services/io/podio/JEventProcessorPODIO:

    "ReconstructedElectrons",

src/algorithms/reco/ElectronReconstruction.h:


#pragma once

#include <edm4eic/MCRecoClusterParticleAssociationCollection.h>
#include <edm4eic/MCRecoParticleAssociationCollection.h>
#include <edm4eic/ReconstructedParticleCollection.h>
#include <edm4hep/MCParticleCollection.h>
#include <spdlog/logger.h>
#include <memory>
#include <vector>

#include "ElectronReconstructionConfig.h"
#include "algorithms/interfaces/WithPodConfig.h"

namespace eicrecon {

    class ElectronReconstruction : public WithPodConfig<ElectronReconstructionConfig>{
    public:

        // Initialization will set the pointer of the logger
        void init(std::shared_ptr<spdlog::logger> logger);

        // Algorithm will apply E/p cut to reconstructed tracks based on truth track-cluster associations
        std::unique_ptr<edm4eic::ReconstructedParticleCollection> execute(
                const edm4hep::MCParticleCollection *mcparts,
                const edm4eic::ReconstructedParticleCollection *rcparts,
                const edm4eic::MCRecoParticleAssociationCollection *rcassoc,
                const std::vector<const edm4eic::MCRecoClusterParticleAssociationCollection*> &in_clu_assoc
        );
        
        // Could overload execute here to, for instance, use track projections 
        // for track-cluster association (instead of truth information)
        
    private:
        std::shared_ptr<spdlog::logger> m_log;  // pointer to logger
        double m_electron{0.000510998928};      // electron mass
    };
} // namespace eicrecon

Note that the algorithm’s parameters are enclosed in a Config object which lives at src/algorithms/reco/ElectronReconstructionConfig.h and can be accessed via the protected member variable m_cfg.

struct ElectronReconstructionConfig {

    double min_energy_over_momentum = 0.9;
    double max_energy_over_momentum = 1.2;

};

The algorithm itself lives at src/algorithms/reco/ElectronReconstruction.cc:


#include "ElectronReconstruction.h"

#include <edm4eic/ClusterCollection.h>
#include <edm4hep/utils/vector_utils.h>
#include <fmt/core.h>

namespace eicrecon {

  void ElectronReconstruction::init(std::shared_ptr<spdlog::logger> logger) {
    m_log = logger;
  }

  std::unique_ptr<edm4eic::ReconstructedParticleCollection> ElectronReconstruction::execute(
    const edm4hep::MCParticleCollection *mcparts,
    const edm4eic::ReconstructedParticleCollection *rcparts,
    const edm4eic::MCRecoParticleAssociationCollection *rcassoc,
    const std::vector<const edm4eic::MCRecoClusterParticleAssociationCollection*> &in_clu_assoc
    ) {

        // Step 1. Loop through MCParticle - cluster associations
        // Step 2. Get Reco particle for the Mc Particle matched to cluster
        // Step 3. Apply E/p cut using Reco cluster Energy and Reco Particle momentum

        // Some obvious improvements:
        // - E/p cut from real study optimized for electron finding and hadron rejection
        // - use of any HCAL info?
        // - check for duplicates?

        // output container
        auto out_electrons = std::make_unique<edm4eic::ReconstructedParticleCollection>();

        for ( const auto *col : in_clu_assoc ){ // loop on cluster association collections
          for ( auto clu_assoc : (*col) ){ // loop on MCRecoClusterParticleAssociation in this particular collection
            auto sim = clu_assoc.getSim(); // McParticle
            auto clu = clu_assoc.getRec(); // RecoCluster

            m_log->trace( "SimId={}, CluId={}", clu_assoc.getSimID(), clu_assoc.getRecID() );
            m_log->trace( "MCParticle: Energy={} GeV, p={} GeV, E/p = {} for PDG: {}", clu.getEnergy(), edm4hep::utils::magnitude(sim.getMomentum()), clu.getEnergy() / edm4hep::utils::magnitude(sim.getMomentum()), sim.getPDG() );


            // Find the Reconstructed particle associated to the MC Particle that is matched with this reco cluster
            // i.e. take (MC Particle <-> RC Cluster) + ( MC Particle <-> RC Particle ) = ( RC Particle <-> RC Cluster )
            auto reco_part_assoc = rcassoc->begin();
            for (; reco_part_assoc != rcassoc->end(); ++reco_part_assoc) {
              if (reco_part_assoc->getSimID() == (unsigned) clu_assoc.getSimID()) {
                break;
              }
            }

            // if we found a reco particle then test for electron compatibility
            if ( reco_part_assoc != rcassoc->end() ){
              auto reco_part = reco_part_assoc->getRec();
              double EoverP = clu.getEnergy() / edm4hep::utils::magnitude(reco_part.getMomentum());
              m_log->trace( "ReconstructedParticle: Energy={} GeV, p={} GeV, E/p = {} for PDG (from truth): {}", clu.getEnergy(), edm4hep::utils::magnitude(reco_part.getMomentum()), EoverP, sim.getPDG() );

              // Apply the E/p cut here to select electons
              if ( EoverP >= m_cfg.min_energy_over_momentum && EoverP <= m_cfg.max_energy_over_momentum ) {
                out_electrons->push_back(reco_part.clone());
              }

            } else {
              m_log->debug( "Could not find reconstructed particle for SimId={}", clu_assoc.getSimID() );
            }

          } // loop on MC particle to cluster associations in collection
        } // loop on collections

        m_log->debug( "Found {} electron candidates", out_electrons->size() );
        return out_electrons;
    }

}

Key Points