track AGATA – bootcamp <data/dev>

dataset and goals

The AGATA dataset is based on simulation of an in beam fission experiment. In this experiment a heavy beam is impinging a target and fission in flight. On of the two fission fragment is randomly detected by a zero degree spectrometer providing its A and Z, as well as its beta and direction. Additionally a HPGe array detect the gamma rays emitted in flight.

The goal is to produce doppler corrected gamma ray emission spectrum for a set of nuclei selected by the users. For this the student will work in pairs as Pilot/Copilot. A group is typically 4 to 5 students, so the course is design with 2 pairs in mind. This could be adjusted depending on the actual configuration of your school.

The list proposed here is a minimal steps needed for completion of the course. It is organised as sprint, a structured session with explicit goals. At the beginning of a sprint, student discuss goals with the tutors, then git issues are created by the tutors. You will find those issues in the following code blocks. During the sprint, tutors can review code before merge request, and create new issues for the student to follow. Depending on their level of autonomy, adjust the details put in the git issues.

The dataset could be downloaded from zenodo : https://doi.org/10.5281/zenodo.18253801

sprint 0 : Warming up!

This sprint is a simple warmup with no programming requires. It helps student get on with the sprint process.

Add a conda.yaml file to allow conda/mamba/micromamba environment installation.

Add a LICENSE file to the repository. Choose an appropriate open source license.

To discuss with student at the end of the sprint:

Check they added proper instruction to the README file
Discuss the LICENSE choice

sprint 1 : Introduction to data analysis

In this sprint the student get on board with the data analysis. They have to add variable to the output of the base analysis.

The analysis code does not treat fragments information about angle and velocity.

The analysis code does not treat gamma information.

sprint 2 : Creating a basic test

In this sprint the student produce a test data file and an associated test program that could be incorporated into a futur CI/CD pipeline.

A small sample file need to be provided for test purposes.

Create a small test file with the first 100 events of run1. Put the file in resources folder and name it sample_raw.root.

A test program is needed to check we can read and write data files. The programm programm will read and copy data to a new file, then open both and check they are equal:   
      - create a test folder to contain the unit test
      - edit CMakeList to make it compile
      - develop the program

To discuss at the end of the sprint:

check there is appropriate instruction to run the test in the README file.

Add a README file in the test folder detailing what the test is doing, how to run it, and possible outcome.

sprint 3 : Advanced analysis

In this sprint the students are implementing actual data manipulation. The goal is calibrate in energy the data and apply doppler correction. RUN1 is calibrated (no parameter needed), but subsequent run need the use of the calibration present in the metadata files.

Implement doppler correction in the class (in the datalib folder) and call it in the analysis (in the utility folder).

Implement calibration of the gammaE variable in the class (in the datalib folder) and call it in the analysis (in the utility folder).

sprint 4 : metadata file and histogram production

In this sprint the student will implement additional functionalities. The first pair will work on the implementation of a python script production histogram for a given nuclei. The second pair will implement the generation of a yaml metadata file from the main analysis.

We wish the analysis produced a rich metadata file telling us the date of production, the author, the inputs. We would like this file to be machine readable and the collaboration settle on using a yaml format.
Code snippet on how to write such a file are provided here: <code snippet link>

We want to have a python script under the scripts directory that produce a root histogram of a given nuclei. You can start from the snippet to get a basic python script doing the draw. You need to add the corresponding flag to self the nuclei A,Z and the width of the gate.

Update the documentation to explain the data structure, the type of leaf, and how to run the different programs and scripts.

sprint 5 : Automation using Snakemake

This sprint requires the Snakemake module to take place. After the snakemake introduction, students will get the chance to develop their own snakemake workflow detailing every step of the analysis.

Based on the Snakefile example provided in the snippet and example from the presentation, implement a Snakefile for own your analysis

Modify the snakefile to add a rule on running generate_histo. Use parameters to this rule to set the selected nuclei.

Create a rule using the hadd command to sum the histo in each file and produce a final one:
hadd <outfile_name> <file1> <file2> ... <filen>

sprint 6 : advanced Snakemake

After the initial snakemake implementation, the students have to implement advanced features to the workflow, producing histogram for a several nuclei in batches.

The config.yaml file produce only one nuclei. The file should take a list of nuclei to process and do all of them.

THe config.yaml file should include the path to the raw data as it could be in different places depending on the way users access to the data.

The documentation does not put anything about the histogram generation.
There is also no documentation on the output file and the metadata.

Add an AUTHOR file detailing the team beyond the software. Add a CONTRIBUTOR file for people outside the devellopment team that contributed (bug report, advice, ...).

Final Steps

The student are now ready for production. The next step will be to start discussion with a group from the EXILL track. They have to prepare a short presentation of their work to deliver to the other team and discuss further action with them!