NVIDIA · ivanbasov · Apr 22, 2026 · Apr 21, 2026
diff --git a/README.md b/README.md
@@ -438,6 +438,11 @@ Each `model_id` has a fixed receptive field \(R\):
 - **model 4**: \(R=13\)
 - **model 5**: \(R=13\)
 
+#### Training recommendations
+
+- **Models 1, 4, 5 (uncorrelated matching):** Train for at least **100 epochs**. Fewer epochs will yield under-trained models.
+- **Shots per epoch:** Use **67 million** shots per epoch when training with 8 GPUs (`PREDECODER_TRAIN_SAMPLES=67108864`). Using fewer shots per epoch produces worse results.
+
 #### Distance / rounds semantics
 
 - Top-level `distance` / `n_rounds` are the **evaluation targets** (what you care about in inference).
@@ -563,6 +568,8 @@ The five grouped totals are:
 - If `max_group >= 6e-3`: parameters are **not** modified (the training log emits a warning in case this indicates a configuration error).
 - Non-surface-code types (`code_type != "surface_code"`) are never upscaled.
 
+**Algorithm in brief:** The pipeline stores `p_max = max(P_prep, P_meas, P_idle_cnot, P_idle_spam, P_cnot)` from the full 25-parameter noise vector and rescales the entire vector by `0.006 / p_max` so that `p_max` is raised to **0.6%** (6 × 10⁻³). The original noise model is preserved unchanged for evaluation.
+
 We have found that training on denser syndromes and then evaluating on sparser data produces better results than training directly on sparse data.
 
 #### Skipping noise upscaling

diff --git a/TRAINING.md b/TRAINING.md
@@ -141,8 +141,8 @@ export CONFIG_NAME=config_qec_decoder_r13_fp8
 
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `PREDECODER_TRAIN_EPOCHS` | `100` | Total number of training epochs. |
-| `PREDECODER_TRAIN_SAMPLES` | config-defined | Samples per epoch. Bypasses auto-scaling when set explicitly. |
+| `PREDECODER_TRAIN_EPOCHS` | `100` | Total number of training epochs. For models 1, 4, 5 (uncorrelated matching), use at least **100** epochs; fewer epochs will yield under-trained models. |
+| `PREDECODER_TRAIN_SAMPLES` | config-defined | Samples per epoch. Bypasses auto-scaling when set explicitly. For best results with 8 GPUs, use **67 million** shots per epoch (`67108864`); fewer shots per epoch will produce worse results. |
 | `PREDECODER_LR_MILESTONES` | config-defined | Comma-separated LR schedule milestone fractions (e.g. `0.25,0.5,1.0`). |
 | `PREDECODER_TIMING_RUN` | unset | Set `1` for timing/benchmarking mode (disables some overhead). |
 | `PREDECODER_TORCH_COMPILE` | `0` when run via `sbatch_train.sh`, otherwise unset | `0` to disable `torch.compile`, `1` to enable. |