Exercise 3: Filling out your benchmark

Overview

Teaching: 20 min
Exercises: 10 min

Questions

How do we fill in each stage of the benchmark pipeline?

Objectives

Fill out the many steps of your benchmark

Collect templates for the benchmark stages

In this lesson we will be beefing up our benchmark by filling out several of the pipeline stages.

Setting up

Before filling out the stages for GitLab’s CI and pipelines, we want to first create a file that contains some settings used by our benchmark.

Create a new file: benchmarks/your_benchmark/setup.config with the following contents

#!/bin/bash
source strict-mode.sh

export ENV_MODE=eicweb

USE_SIMULATION_CAMPAIGN=true

N_EVENTS=100

FILE_BASE=sim_output/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree
INPUT_FILE=root://dtn-eic.jlab.org//volatile/eic/EPIC/EVGEN/EXCLUSIVE/UCHANNEL_RHO/10x100/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree.root
OUTPUT_FILE=${FILE_BASE}.detectorsim.root

REC_FILE_BASE=${FILE_BASE}.detectorsim.edm4eic
REC_FILE=${REC_FILE_BASE}.root

The export ENV_MODE=eicweb lets our Snakefile know to use the paths for running on eicweb.

Here we’ve defined a switch USE_SIMULATION_CAMPAIGN which will allow us to alternate between using output from the simulation campaign, and dynamically simulating new events.

When not using the simulation campaign, the N_EVENTS variable defines how many events the benchmark should run. The rest of these variables define file names to be used in the benchmark.

Also create a new file benchmarks/your_benchmark/simulate.sh with the following contents:

#!/bin/bash
source strict-mode.sh
source benchmarks/your_benchmark/setup.config $*

if [ -f ${INPUT_FILE} ]; then
  echo "ERROR: Input simulation file does ${INPUT_FILE} not exist."
else
  echo "GOOD: Input simulation file ${INPUT_FILE} exists!"
fi

# Simulate
ddsim --runType batch \
      -v WARNING \
      --numberOfEvents ${N_EVENTS} \
      --part.minimalKineticEnergy 100*GeV  \
      --filter.tracker edep0 \
      --compactFile ${DETECTOR_PATH}/${DETECTOR_CONFIG}.xml \
      --inputFiles ${INPUT_FILE} \
      --outputFile  ${OUTPUT_FILE}
if [[ "$?" -ne "0" ]] ; then
  echo "ERROR running ddsim"
  exit 1
fi

This script uses ddsim to simulate the detector response to your benchmark events.

Create a script named benchmarks/your_benchmark/reconstruct.sh to manage the reconstruction:

#!/bin/bash
source strict-mode.sh
source benchmarks/your_benchmark/setup.config $*

# Reconstruct
if [ ${RECO} == "eicrecon" ] ; then
  eicrecon ${OUTPUT_FILE} -Ppodio:output_file=${REC_FILE}
  if [[ "$?" -ne "0" ]] ; then
    echo "ERROR running eicrecon"
    exit 1
  fi
fi

if [[ ${RECO} == "juggler" ]] ; then
  gaudirun.py options/reconstruction.py || [ $? -eq 4 ]
  if [ "$?" -ne "0" ] ; then
    echo "ERROR running juggler"
    exit 1
  fi
fi

if [ -f jana.dot ] ; then cp jana.dot ${REC_FILE_BASE}.dot ; fi

#rootls -t ${REC_FILE_BASE}.tree.edm4eic.root
rootls -t ${REC_FILE}

Create a file called benchmarks/your_benchmark/analyze.sh which will run the analysis and plotting scripts:

#!/bin/bash
source strict-mode.sh
source benchmarks/your_benchmark/setup.config $*

OUTPUT_PLOTS_DIR=sim_output/nocampaign
mkdir -p ${OUTPUT_PLOTS_DIR}
# Analyze
command time -v \
root -l -b -q "benchmarks/your_benchmark/analysis/uchannelrho.cxx(\"${REC_FILE}\",\"${OUTPUT_PLOTS_DIR}/plots.root\")"
if [[ "$?" -ne "0" ]] ; then
  echo "ERROR analysis failed"
  exit 1
fi

if [ ! -d "${OUTPUT_PLOTS_DIR}/plots_figures" ]; then
    mkdir "${OUTPUT_PLOTS_DIR}/plots_figures"
    echo "${OUTPUT_PLOTS_DIR}/plots_figures directory created successfully."
else
    echo "${OUTPUT_PLOTS_DIR}/plots_figures directory already exists."
fi
root -l -b -q "benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C(\"${OUTPUT_PLOTS_DIR}/plots.root\")"
cat benchmark_output/*.json

Let’s copy over our analysis script, our plotting macro & header, and our Snakefile:

mkdir benchmarks/your_benchmark/analysis
mkdir benchmarks/your_benchmark/macros

cp ../starting_script/Snakefile benchmarks/your_benchmark/
cp ../starting_script/analysis/uchannelrho.cxx benchmarks/your_benchmark/analysis/
cp ../starting_script/macros/RiceStyle.h benchmarks/your_benchmark/macros/
cp ../starting_script/macros/plot_rho_physics_benchmark.C benchmarks/your_benchmark/macros/

Your benchmark directory should now look like this: Add a title

In order to use your Snakefile, let GitLab know it’s there. Open the main Snakefile, NOT this one benchmarks/your_benchmark/Snakefile, but the one at the same level as the benchmarks directory.

Go to the very end of the file and include a path to your own Snakefile:

include: "benchmarks/diffractive_vm/Snakefile"
include: "benchmarks/dis/Snakefile"
include: "benchmarks/demp/Snakefile"
include: "benchmarks/your_benchmark/Snakefile"

Once that’s all setup, we can move on to actually adding these to our pipeline!

The “simulate” pipeline stage

We now fill out the simulate stage in GitLab’s pipelines. Currently the instructions for this rule should be contained in benchmarks/your_benchmark/config.yml as:

your_benchmark:simulate:
  extends: .phy_benchmark
  stage: simulate
  script:
    - echo "I will simulate detector response here!"

In order to make sure the previous stages finish before this one starts, add a new line below stage:simulate: needs: ["common:setup"].

This step can take a long time if you simulate too many events. So let’s add an upper limit on the allowed run time of 10 hours: In a new line below needs: ["common:setup"], add this: timeout: 10 hour.

Now in the script section of the rule, add two new lines to source the setup.config file:

    - config_file=benchmarks/your_benchmark/setup.config
    - source $config_file

Add instructions that if using the simulation campaign you can skip detector simulations. Otherwise simulate

    - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
    -     echo "Using simulation campaign so skipping this step!"
    - else
    -     echo "Grabbing raw events from XRootD and running Geant4"
    -     bash benchmarks/your_benchmark/simulate.sh
    -     echo "Geant4 simulations done! Starting eicrecon now!"
    -     bash benchmarks/your_benchmark/reconstruct.sh
    - fi
    - echo "Finished simulating detector response"

Finally, add an instruction to retry the simulation if it fails:

  retry:
    max: 2
    when:
      - runner_system_failure

The final simulate rule should look like this:

your_benchmark:simulate:
  extends: .phy_benchmark
  stage: simulate
  needs: ["common:setup"]
  timeout: 10 hour
  script:
    - echo "I will simulate detector response here!"
    - config_file=benchmarks/your_benchmark/setup.config
    - source $config_file
    - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
    -     echo "Using simulation campaign!"
    - else
    -     echo "Grabbing raw events from XRootD and running Geant4"
    -     bash benchmarks/your_benchmark/simulate.sh
    -     echo "Geant4 simulations done! Starting eicrecon now!"
    -     bash benchmarks/your_benchmark/reconstruct.sh
    - fi
    - echo "Finished simulating detector response"
  retry:
    max: 2
    when:
      - runner_system_failure

The “results” pipeline stage

The results stage in config.yml is right now just this:

your_benchmark:results:
  extends: .phy_benchmark
  stage: collect
  script:
    - echo "I will collect results here!"

Specify that we need to finish the simulate stage first:

  needs:
    - ["your_benchmark:simulate"]

Now make two directories to contain output from the benchmark analysis and source setup.config again:

    - mkdir -p results/your_benchmark
    - mkdir -p benchmark_output
    - config_file=benchmarks/your_benchmark/setup.config
    - source $config_file

If using the simulation campaign, we can request the rho mass benchmark with snakemake. Once snakemake has finished creating the benchmark figures, we copy them over to results/your_benchmark/ in order to make them into artifacts:

    - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
    -     echo "Using simulation campaign!"
    -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
    -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/

If not using the simulation campaign, we can just run the analyze.sh script and copy the results into results/your_benchmark/ in order to make them into artifacts:

    - else
    -     echo "Not using simulation campaign!"
    -     bash benchmarks/your_benchmark/analyze.sh
    -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
    - fi
    - echo "Finished copying!"

Your final config.yml should look like:

your_benchmark:compile:
  extends: .phy_benchmark 
  stage: compile
  script:
    - echo "You can compile your code here!"

your_benchmark:simulate:
  extends: .phy_benchmark
  stage: simulate
  needs: ["common:setup"]
  timeout: 10 hour
  script:
    - echo "Simulating everything here!"
    - config_file=benchmarks/your_benchmark/setup.config
    - source $config_file
    - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
    -     echo "Using simulation campaign!"
    - else
    -     echo "Grabbing raw events from XRootD and running Geant4"
    -     bash benchmarks/your_benchmark/simulate.sh
    -     echo "Geant4 simulations done! Starting eicrecon now!"
    -     bash benchmarks/your_benchmark/reconstruct.sh
    - fi
    - echo "Finished simulating detector response"
  retry:
    max: 2
    when:
      - runner_system_failure

your_benchmark:results:
  extends: .phy_benchmark
  stage: collect
  script:
    - echo "I will collect results here!"
  needs:
    - ["your_benchmark:simulate"]
  script:
    - mkdir -p results/your_benchmark
    - mkdir -p benchmark_output
    - config_file=benchmarks/your_benchmark/setup.config
    - source $config_file
    - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
    -     echo "Using simulation campaign!"
    -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
    -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
    - else
    -     echo "Not using simulation campaign!"
    -     bash benchmarks/your_benchmark/analyze.sh
    -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
    - fi
    - echo "Finished copying!"

Testing Real Pipelines

We’ve set up our benchmark to do some real analysis! As a first test, let’s make sure we’re still running only over the simulation campaign. The USE_SIMULATION_CAMPAIGN in setup.config should be set to true.

Now let’s add our changes and push them to GitHub!

git status

This command should show something like this: Add a title

Now add all our changes:

git add Snakefile
git add benchmarks/your_benchmark/config.yml
git add benchmarks/your_benchmark/Snakefile
git add benchmarks/your_benchmark/analysis/uchannelrho.cxx 
git add benchmarks/your_benchmark/analyze.sh
git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C 
git add benchmarks/your_benchmark/macros/RiceStyle.h 
git add benchmarks/your_benchmark/reconstruct.sh
git add benchmarks/your_benchmark/setup.config
git add benchmarks/your_benchmark/simulate.sh

git commit -m "I'm beefing up my benchmark!"
git push origin pr/your_benchmark_<mylastname>

Now monitor the pipeline you created:

Key Points

Create setup.config to switch between using the simulation campaign and re-simulating events

Each stage of the benchmark pipeline is defined in config.yml

config.yml takes normal bash scripts as input

Copy resulting figures over to the results directory to turn them into artifacts

previous episode

Developing Benchmarks

next episode