This lesson is being piloted (Beta version)

Creating or modifying a JANA factory in order to implement a reconstruction algorithm

Overview

Teaching: 15 min
Exercises: 20 min
Questions
  • How to write a reconstruction algorithm in EICrecon?

Objectives
  • Learn how to create a new factory in EICrecon that supplies a reconstruction algorithm for all to use.

  • Understand the directory structure for where the factory should be placed in the source tree.

  • Understand how to use a generic algorithm in a JANA factory.

Note: The following episode presents a somewhat outdated view, and some commands may not function. If you are only interested in analyzing already-reconstructed data, then there is no requirement to use a plugin as described below; the output ROOT file works too.

Introduction

Now that you’ve learned about JANA plugins and JEventProcessors, let’s talk about JFactories. JFactories are another essential JANA component just like JEventProcessors and JEventSources. While JEventProcessors are used for aggregating results from each event into a structured output such as a histogram or a file, JFactories are used for computing those results in an organized way.

When do I use a JFactory?

Why should I prefer writing a JFactory?

  1. They make your code reusable. Different people can use your results later without having to understand the specifics of what you did.

  2. If you are consuming some data which doesn’t look right to you, JFactories make it extremely easy to pinpoint exactly which code produced this data.

  3. EICrecon needs to run multithreaded, and using JFactories can help steer you away from introducing thorny parallelism bugs.

  4. You can simply ask for the results you need and the JFactory will provide it. If nobody needs the results from the JFactory, it won’t be run. If the results were already in the input file, it won’t be run. If there are multiple consumers, the results are only computed once and then cached. If the JFactory relies on results from other JFactories, it will call them transparently and recursively.

When do I create my own plugin?

Algorithms vs Factories

In general, a Factory is a programming pattern for constructing objects in an abstract way. Oftentimes, the Factory is calling an algorithm under the hood. This algorithm may be very generic. For instance, we may have a Factory that produces Cluster objects for a barrel calorimeter, and it calls a clustering algorithm that doesn’t care at all about barrel calorimeters, just the position and energy of energy of each CalorimeterHit object. Perhaps multiple factories for creating clusters for completely different detectors are all using the same algorithm.

Note that Gaudi provides an abstraction called “Algorithm” which is essentially its own version of a JFactory. In EICrecon, we have been separating out generic algorithms from the old Gaudi and new JANA code so that these can be developed and tested independently. To see an example of how a generic algorithm is being implemented, look at these examples:

src/detectors/EEMC/RawCalorimeterHit_factory_EcalEndcapNRawHits.h src/algorithms/calorimetry/CalorimeterHitDigi.h src/algorithms/calorimetry/CalorimeterHitDigi.cc

Using generic algorithms makes things slightly more complex. However, the generic algorithms can be recycled for use in multiple detector systems which adds some simplification.

Parallelism considerations

JEventProcessors observe the entire event stream, and require a critical section where only one thread is allowed to modify a shared resource (such as a histogram) at any time. JFactories, on the other hand, only observe a single event at a time, and work on each event independently. Each worker thread is given an independent event with its own set of factories. This means that for a given JFactory instance, there will be only one thread working on one event at any time. You get the benefits of multithreading without having to make each JFactory thread-safe.

You can write JFactories in an almost-functional style, but you can also cache some data on the JFactory that will stick around from event-to-event. This is useful for things like conditions and geometry data, where for performance reasons you don’t want to be doing a deep lookup on every event. Instead, you can write callbacks such as BeginRun(), where you can update your cached values when the run number changes.

Note that just because the JFactory can be called in parallel doesn’t mean it always will. If you call event->Get() from inside JEventProcessor::ProcessSequential, in particular, the factory will run single-threaded and slow everything down. However, if you call it using Prefetch instead, it will run in parallel and you may get a speed boost.

How do I use an existing JFactory?

Using an existing JFactory is extremely easy! Any time you are someplace where you have access to a JEvent object, do this:


auto clusters = event->Get<edm4eic::Cluster>("EcalEndcapNIslandClusters");

for (auto c : clusters) {
  // ... do something with a cluster
}

As you can see, it doesn’t matter whether the Cluster objects were calculated from some simpler objects, or were simply loaded from a file. This is a very powerful concept.

One thing we might want to do is to swap one factory for another, possibly even at runtime. This is easy to do if you just make the factory tag be a parameter:


std::string my_cluster_source = "EcalEndcapNIslandClusters";  // Make this be a parameter
app->SetDefaultParameter("MyPlugin:MyAnalysis:my_cluster_source", my_cluster_source, "Cluster source for MyAnalysis");
auto clusters = event->Get<edm4eic::Cluster>(my_cluster_source);

How do I create a new JFactory?

We are going to add a new JFactory inside EICrecon.

src/detectors/EEMC/Cluster_factory_EcalEndcapNIslandClusters.h:

#pragma once

#include <edm4eic/Cluster.h>
#include <JANA/JFactoryT.h>
#include <services/log/Log_service.h>

class Cluster_factory_EcalEndcapNIslandClusters : public JFactoryT<edm4eic::Cluster> {
public:

    Cluster_factory_EcalEndcapNIslandClusters(); // Constructor

    void Init() override;  
    // Gets called exactly once at the beginning of the JFactory's life

    void ChangeRun(const std::shared_ptr<const JEvent> &event) override {};
    // Gets called on events where the run number has changed (before Process())

    void Process(const std::shared_ptr<const JEvent> &event) override;
    // Gets called on every event

    void Finish() override {};
    // Gets called exactly once at the end of the JFactory's life

private:
    float m_scaleFactor;
    std::shared_ptr<spdlog::logger> m_log;

};

src/detectors/EEMC/Cluster_factory_EcalEndcapNIslandClusters.cc:

#include "Cluster_factory_EcalEndcapNIslandClusters.h"

#include <edm4eic/ProtoCluster.h>
#include <JANA/JEvent.h>


Cluster_factory_EcalEndcapNIslandClusters::Cluster_factory_EcalEndcapNIslandClusters() {

    SetTag("EcalEndcapNIslandClusters");
}


void Cluster_factory_EcalEndcapNIslandClusters::Init() {
    auto app = GetApplication();

    // This is an example of how to declare a configuration parameter that
    // can be set at run time. e.g. with -PEEMC:EcalEndcapNIslandClusters:scaleFactor=0.97
    m_scaleFactor =0.98;
    app->SetDefaultParameter("EEMC:EcalEndcapNIslandClusters:scaleFactor", m_scaleFactor, "Energy scale factor");

    // This is how you access shared resources using the JService interface
    m_log = app->GetService<Log_service>()->logger("EcalEndcapNIslandClusters");
}


void Cluster_factory_EcalEndcapNIslandClusters::Process(const std::shared_ptr<const JEvent> &event) {

    m_log->info("Processing event {}", event->GetEventNumber());

    // Grab inputs
    auto protoclusters = event->Get<edm4eic::ProtoCluster>("EcalEndcapNIslandProtoClusters");

    // Loop over protoclusters and turn each into a cluster
    std::vector<edm4eic::Cluster*> outputClusters;
    for( auto proto : protoclusters ) {

        // ======================
        // Algorithm goes here!
        // ======================

        auto cluster = new edm4eic::Cluster(
            0, // type 
            energy * m_scaleFactor,
            sqrt(energyError_squared),
            time,
            timeError,
            proto->hits_size(),
            position,
            edm4eic::Cov3f(), // positionError,
            0.0,              // intrinsicTheta,
            0.0,              // intrinsicPhi,
            edm4eic::Cov2f()  // intrinsicDirectionError
            );

        outputClusters.push_back( cluster );
    }

    // Hand ownership of algorithm objects over to JANA
    Set(outputClusters);
}


We can now fill in the algorithm with anything we like!

        // Grab inputs
        auto protoclusters = event->Get<edm4eic::ProtoCluster>("EcalEndcapNIslandProtoClusters");

        // Loop over protoclusters and turn each into a cluster
        std::vector<edm4eic::Cluster*> outputClusters;
        for( auto proto : protoclusters ) {

            // Fill cumulative values by looping over all hits in proto cluster
            float energy = 0;
            double energyError_squared = 0.0;
            float time = 1.0E8;
            float timeError;
            edm4hep::Vector3f position;
            double sum_weights = 0.0;
            for( uint32_t ihit=0; ihit<proto->hits_size() ; ihit++){
                auto const &hit = proto->getHits(ihit);
                auto weight = proto->getWeights(ihit);
                energy += hit.getEnergy();
                energyError_squared += std::pow(hit.getEnergyError(), 2.0);
                if( hit.getTime() < time ){
                    time = hit.getTime();            // use earliest time
                    timeError = hit.getTimeError();  // use error of earliest time
                }
                auto &p = hit.getPosition();
                position.x += p.x*weight;
                position.y += p.y*weight;
                position.z += p.z*weight;
                sum_weights += weight;
            }
            
            // Normalize position
            position.x /= sum_weights;
            position.y /= sum_weights;
            position.z /= sum_weights;

            // Create a cluster object from values accumulated from hits above
            auto cluster = new edm4eic::Cluster(
                0, // type (?))
                energy * m_scaleFactor,
                sqrt(energyError_squared),
                time,
                timeError,
                proto->hits_size(),
                position,

                // Not sure how to calculate these last few
                edm4eic::Cov3f(), // positionError,
                0.0, // intrinsicTheta,
                0.0, // intrinsicPhi,
                edm4eic::Cov2f() // intrinsicDirectionError
                );

            outputClusters.push_back( cluster );
        }

        // Hand ownership of algorithm objects over to JANA
        Set(outputClusters);

You can’t pass JANA a JFactory directly (because it needs to create an arbitrary number of them on the fly). Instead you register a JFactoryGenerator object:

src/detectors/EEMC/EEMC.cc


// In your plugin's init

#include <JANA/JFactoryGenerator.h>
// ...
#include "Cluster_factory_EcalEndcapNIslandClusters.h"

extern "C" {
    void InitPlugin(JApplication *app) {
        InitJANAPlugin(app);
        // ...

        app->Add(new JFactoryGeneratorT<Cluster_factory_EcalEndcapNIslandClusters>());
     }

Finally, we go ahead and trigger the factory (remember, factories won’t do anything unless activated by a JEventProcessor). You can open the

eicrecon in.root -Ppodio:output_file=out.root -Ppodio:output_collections=EcalEndcapNIslandClusters -Pjana:nevents=10

Your exercise is to get this JFactory working! You can tweak the algorithm, add log messages, add additional config parameters, etc.

Key Points

  • Create a factory for reconstructing single subdetector data or for global reconstruction.