π§ Cooling Down Checkpoints: Best Practices for Model Evaluation
Best practices for cooling down model checkpoints before evaluation to improve validation loss comparisons.
Sam Foreman 2025-11-12
- π Simple Experiment to Compare Validation Loss
- βοΈ Cooling Down
- β»οΈ Convert to Universal (Optional)
- π W&B Report
π Simple Experiment to Compare Validation Loss


βοΈ Cooling Down
-
256 Nodes of Aurora:
-
Cooled down over last 10%:
- W&B Run: volcanic-blaze-4312
-
Explicit command:
ROPE_THETA=50000 \ GRAD_ACC_STEPS=2 \ MICRO_BATCH=1 \ USE_ACTIVATION_CHECKPOINTING=0 \ ZERO_STAGE=0 \ TRAIN_TOKENS=4673780159710 \ OPT=sophiag \ DATA_FILE_LIST=ALCF/data-lists/aurora/olmo-mix-1124.txt \ LR_DECAY_STYLE=constant \ LOAD=cooldown-checkpoints/sophiag-global-step-73500/global_step73500 \ bash train_alcf.sh \ --no-load-lr-state \ --lr_constant_plus_cooldown \ --lr_constant_plus_cooldown_frac 0.10
-
β»οΈ Convert to Universal (Optional)
TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 python3 ALCF/ds_to_universal.py \
--input_folder test_rollback/global_step136000 \
--output_folder test_rollback/global_step136000_universal
π W&B Report
<iframe loading=βlazyβ src=βhttps://api.wandb.ai/links/aurora_gpt/dek99dmdβ align=βcenterβ frameborder=β0β webkitallowfullscreen allowfullscreen style=βborder:none;height:1024px;width:100%β>
</iframe>