dataset and goals
The EXILL dataset is based on simulation of an neutron-induced fission experiment. In this experiment a thermal neutrons beam is impinging a 235U target and fission. The 236U componded nucleus then fission at rest. The two fission fragment are de-exciting by gamma-decay emission, and are detected by a HPGe array.
The goal is to produce validated physics outputs (gamma spectra, gamma-gamma matrices, gated projections) and to automate the pipeline with reproducible steps. For this the student will work in pairs as Pilot/Copilot. A group is typically 4 to 5 students, so the course is design with 2 pairs in mind. This could be adjusted depending on the actual configuration of your school.
The list proposed here is a minimal steps needed for completion of the course. It is organised as sprint, a structured session with explicit goals. At the beginning of a sprint, student discuss goals with the tutors, then git issues are created by the tutors. You will find those issues in the following code blocks. During the sprint, tutors can review code before merge request, and create new issues for the student to follow. Depending on their level of autonomy, adjust the details put in the git issues.
sprint 0 : Warming up!
This sprint is a warmup focused on repository setup, basic build, and first run of the analysis.
This sprint is a simple warmup with no programming requires. It helps student get on with the sprint process.
Add a conda.yaml file to allow conda/mamba/micromamba environment installation.
Add a LICENSE file to the repository. Choose an appropriate open source license.
To discuss with student at the end of the sprint:
- Check they added proper instruction to the README file
- Discuss the LICENSE choice
sprint 1 : Introduction to data analysis
In this sprint the student get on board with the data analysis. They have to add variable to the output of the base analysis. The analysis code only provide gamma-ray energies.
The analysis code only provide a vector of gamma-ray energies. A branch containing the gamma-ray multiplicity needs to be added.
Add a branch containing the sum-energy of the gamma-rays.
sprint 2 : Creating a basic test
In this sprint the student produce a test data file and an associated test program that could be incorporated into a futur CI/CD pipeline.
A small sample file need to be provided for test purposes. Create a small test file with the first 100 events of run1. Put the file in resources folder and name it sample_raw.root.
A test program is needed to check the produced data content. The programm programm will read the file produced by group 1 and performs the following tests:
- the number of events in the tree is 100
- the tree contains a branch named "multiplicity" and "sum_energy"
To be done:
- create a test folder to contain the unit test
- edit CMakeList to make it compile
- develop the programTo discuss at the end of the sprint:
- check there is appropriate instruction to run the test in the README file.
Add a README file in the test folder detailing what the test is doing, how to run it, and possible outcome.
sprint 3 : Advanced analysis
In this sprint the students are implementing actual data manipulation. The goal is to produce gamma-gamma correlation matrices, and produce a python script that makes projections on the matrix. The full group will then apply its first gates on the produced data and check if the result is consistent with published data.
Step 1: Produce an executable "generate_gg.cxx" to build a gamma-gamma matrix from a TTree and add a TH1 total projection. The produced output file will contain the 2D matrix and the 1D histogram projection. Step 2: Produce a python script that takes as input the file produced at step 1. It reads the matrix and create a 1D projection on an energy range given in parameter (energy gate).
Step 1: From the file resources/EXILL_Partners.root, analyse the fission yields histograms and determine which is the most populated nucleus to be able to work on a single run. Step 2: Using the GetPartner program, determine which nucleus is the most likely partner of this most populated nucleus. Step 3: From ENSDF, or your favorite nuclear database, determine the first transitions supposed to be seen in the gamma-ray spectrum of both fission fragments.
Once the developments of the two pairs are validated, check together that everything works properly:
Step 4: Apply gates on the matrix with the developed codes and check concistency
sprint 4 : metadata file
In this sprint the student will implement additional functionalities regarding metadata production. We wish the analysis produced a rich metadata file telling us the date of production, the author, the inputs. We would like this file to be machine readable and the collaboration settle on using a yaml format.
The first pair will implement metadata production for the files generated by EXILL analysis code (yaml format or any other choice). The second pair will implement metadata production for the files generated by gamma-gamma code.
Produce metadata for the files generated by EXILL analysis code (yaml format or any other choice)
Produce metadata for the files generated by gamma-gamma analysis code (yaml format or any other choice)
Update the documentation to explain the data structure, the type of leaf, and how to run the different programs and scripts.
sprint 5 : Automation using Snakemake and data calibration
This sprint requires the Snakemake module to take place. After the Snakemake introduction, students will get the chance to develop their own Snakemake workflow detailing every step of the analysis.
Run1 is calibrated, but not the other ones. All the runs needs to be calibrated using the provided calibration files to be able to sum in the end all the runs.
Based on the Snakefile example provided in the snippet and example from the presentation, implement a Snakefile for your own analysis. The pipeline should generate the analysed trees, generate the gamma-gamma matrices.
Implement calibration of the gammaE variable in the class (in the datalib folder) and call it in the analysis (in the utility folder).
Once the developments of the two pairs are validated, check together that all the runs are indeed well calibrated, and add a rule in Snakemake to add all the matrices
Create a rule using the hadd command to sum the matrices in each file and produce a final one: hadd <outfile_name> <file1> <file2> ... <filen>
sprint 6 : advanced analysis
To go deeper in the analysis and having the possibility to measure nuclei with lower yields, the students will need to integrate advanced features. Background gates need to be added for cleaner projections and gamma-gamma matrices could be produced on the basis of a first energy selection to end with doubly gated spectra
Add an option to provide background gates in the python script used for projections
Add an option to provide a first gate/background condition to pre-conditionned the gamma-gamma matrices to a first gamma-ray energy
The documentation does not put anything about the histogram generation. There is also no documentation on the output file and the metadata.
Add an AUTHOR file detailing the team beyond the software. Add a CONTRIBUTOR file for people outside the devellopment team that contributed (bug report, advice, ...).
Final Steps
The student are now ready for production.
The next step will be to start discussion with a group from the AGATA track. They have to prepare a short presentation of their work to deliver to the other team and discuss further action with them!
Together, they can try to do the level scheme of 90Br containing unpublished (simulated) gamma-rays.