Skip to content

Cosmic Ray Veto ML training code#6

Open
sam-grant wants to merge 3 commits intoMu2e:mainfrom
sam-grant:sgrant/crv-cosmic
Open

Cosmic Ray Veto ML training code#6
sam-grant wants to merge 3 commits intoMu2e:mainfrom
sam-grant:sgrant/crv-cosmic

Conversation

@sam-grant
Copy link
Copy Markdown

@sam-grant sam-grant commented Apr 29, 2026

Cosmic Ray Veto ML training code

Adds CrvCosmic/, an XGBoost-based classifier for vetoing cosmic-ray-induced backgrounds, ported from sam-grant/mu2e-cosmic. The model takes per-coincidence CRV and tracker features and outputs a probability that the coincidence matched with the track is cosmic-induced. Trained on CosmicCRYSignalAllOnSpillTriggered (pure cosmics) vs. CeEndpointMix2BBTriggered (beam + pileup).

Only the cuts and helpers actually used by the ML preselection are ported across from sam-grant/mu2e-cosmic, the non-ML parts of the upstream cosmic-background framework are excluded.

Pipeline

  1. Process: Read ROOT EventNtuple files, apply the preprocessing cutset, flatten per-coincidence features into awkward output.
  2. Assemble: Load processed CRY and CE-mix outputs, label, combine, and produce K-fold indices for nested cross-validation (CV).
  3. Train: Fit XGBoost on each fold, find the per-fold operating threshold at a target veto efficiency, then retrain on the full set with the CV-averaged threshold and export to .ubj.
  4. Validate: ROC AUC, score distributions, feature importance, and a "money table" comparing the ML model against the simple ∆t cut baseline.
  5. Optimise: (optional) grid search over XGBoost hyperparameters via k-fold nested CV, minimising deadtime at the target veto efficiency.

Layout

  • config/cuts.yaml: Cutset definitions; MLPreprocess is the only one defined here
  • src/core/: Cut flow & feature helpers, postprocessing combiners
  • src/ml/:MLProcessor, LoadML, AssembleDataset, Train, Validate, Optimise
  • src/utils/: IO, histogram booking, plotting, mu2e.mplstyle
  • run/run_ml_prep.py: Entry point for processing ROOT files into parquet/pkl for training
  • notebooks/: assemble, feature engineering, optimise, train, validate

Dependencies

Requires Mu2e/pyutils (provided by pyenv ana) for pyutils.pycut.CutManager, plus xgboost, scikit-learn, awkward, pandas, hist, pyarrow, h5py, joblib, pyyaml.

Testing

Ran end-to-end on the gpvms (commit b78cf8f)

@sam-grant sam-grant marked this pull request as ready for review April 29, 2026 21:21
@sam-grant
Copy link
Copy Markdown
Author

@oksuzian @AndrewEdmonds11 would you mind taking a look? I'm not able to add reviewers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant