Event Analysis Framework

Khristian Kotov

Introduction

The project provides an environment for building an analysis pipeline of event type data

It consists of the core package managing the pipeline and packages with processing modules:

Package Functionality
AppFramework (core) initialization, scheduling, and messaging between the processing modules
AnUtils generic algorithms, regexp, interactive formulae interpretation
AnModBasic input and output modules
Logistics database of samples, reweighting schemes, scaling factors, etc.
AnObjects analysis objects comprising an event
Readers collection of modules reading the EDM (CMS specific) objects
workdir working directory with a short tutorial

The code can be installed with git clone http://github.com/koskot77/framework.git

It can be tested with make && cd workdir/tutorial && make && python test.py

Analysis concept

A typical pipeline can be specified in a python test.py or a C++ test.cc program

A simplest example shows a pipeline built with just one example module MyModule1:

sequence = AppFramework('MyModule1')
sequence.beginJob()
sequence.process(10)
sequence.endJob()

The initialization is done with the beginJob and "harvesting" is associated with the endJob

Both calls map trivially to the MyModule1::beginJob and MyModule1::endJob methods

Processing of events (just 10 here) lead to a series of calls to MyModule1::event method

MyModule1 inherits the AppModule class and implements beginJob, beginRun, event, endRun, and endJob methods

Execution flow

The return type of the five methods, AppResults, implements an exception mechanism:

Framework behavior:

Flag Action
AppResult::OK keep calm and carry on
AppResult::SKIP ignore the rest of pipeline and move on the the next event (= continue)
AppResult::STOP stop processing (= break)


Payload message will appear in the following streams depending on the severity flag:

Flag Stream
AppResult::LOG clog
AppResult::WARNING cout
AppResult::ERROR cerr

Parameters

Module can define a named parameter of any type; those can be modified as follows:

sequence = AppFramework('MyModule4')
sequence.modify("MyModule4::mystring","Simple string")
sequence.modify("MyModule4::mydouble","999")
sequence.beginJob()

The mystring and mydouble objects in MyModule4 are defined using AppParameter<T> wrapper that behaves itself as an object of type T and provides an additional toString() method along with possibility to be initialized from a string

For user types one has to define pattern specialization with the two string functions

Data flow

Two types of data are supported: (1) "by event" and (2) "on demand"

In the following example, counter and obj data members are modified in MyModule2::event and passed over to downstream consumers under counter1 and counter2 labels via AppEvent container:

sequence = AppFramework('MyModule2->MyModule3')
sequence.beginJob()
sequence.process(10)

The MyModule3 consumer shows how to pull out the two objects (note the const qualifier!)

The counter3 object is an example of the "on demand" data owned by a separate piece of code that is registered at the initialization time

The type of "on demand" data is declared as a template argument of the AppAgent base class

Request of the "on demand" data from AppEvent container triggers fetch or set methods

Toolbox

MyModule5 shows how to build a simple mathematical expression:

sequence = AppFramework('MyModule5')
sequence.modify("MyModule5::formula","( val1+ val2) *val3");
sequence.process(10)

In this example the formula container is populated with several random variables and the given formula is evaluated

The RegEx utility assists using regular expressions

The NchooseM functions helps to avoid many nested loops

Few more details

Verbosity: one can switch off and on cout, clog and cerr streams; for example:

sequence.verbose("AppFramework","cout off")
sequence.verbose("MyModule5","cout off")

The first module can also inherit the AppInputModule class to be able to call beginRun and endRun methods of all of the modules using beginRunNotify and endRunNotify calls

Persistence of the AppEvent container is also supported, but requires serializer/deserializer functions to be provided for every storable data (rarely needed, not explained here)

All modules are named and registered in Using.h file

It is often practical to register the "on demand" data directly in Using.h rather than nesting them in some module as it was shown in the examples above

List of CMS specific packages and modules

Framework's code was written around year 2000 and used for analyses with detector KEDR

It was also worked well with the CMS detector using the following data processing code:

  • AnModBasic/InputModule - module that reads Events and ParameterSets branches from a ROOT tree of a typical CMS EDM file
  • AnModBasic/OutputModule - module that write selected variables (integers and doubles) into CSV files and flat ROOT trees
  • Logistics/SampleHelper - dictionary that associate of sample name with physical location of files for the InputModule
  • AnObjects - analysis objects comprising a particle physics event (leptons, jets, triggers...)
  • Readers - collection of modules that reads CMS specific physics objects

A real world example is available in workdir