 Command

Sam Foreman's personal site. Vim-style keybinds for navigation; theme + font pickers below.

Theme
 Font
Keybinds
Navigation
j / ↓ Next item k / ↑ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block ⌃D / ⌃U Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar ⇧C / ⇧E Collapse / expand all sections
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
⌃N Next document ⌃B Previous document ⌃O History back ⌃I History forward
 Search
about: Sam Foreman docs/test: Docs Test ideas: 💡 Ideas about/more: 🪪 More now: Now more: ➕ More posts: 📬 Posts projects: 📚 Projects talks: 🎙️ Talks webtui: Style posts/2025: 📆 2025 posts/auroragpt: 🤖 AuroraGPT posts/ai-for-physics: ⚛️ AI for Physics posts/dope-slides: 💅 How to Make Dope Slides posts/ezpz-at-alcf: 🍋 ezpz @ ALCF posts/ezpz-v1: 📝 ezpz-v1 posts/jupyter: 📗 Jupyter posts/resume: 🧑🏻‍💻 Sam Foreman’s Résumé posts/svgbob: 🫥 svgbob posts/torchtune-aurora: 🪛 Torchtune on Aurora posts/torchtune-patch-aurora: 🚑 Torchtune Patch on Aurora talks/auroragpt-siam25: AuroraGPT talks/ai-for-science-2024: Parallel Training Methods talks/aurora-gpt-fm-for-electric-grid/auroragpt-fm-for-electric-grid: AuroraGPT: Foundation Models for Science talks/hpc-user-forum/auroragpt: AuroraGPT talks/alcf-hpc-workshop-2024/alcf-hpc-workshop-2024: Deep Learning and Foundation Models at Scale talks/demo-slides: AuroraGPT: Training Foundation Models on Supercomputers talks/incite-hackathon-2025: ALCF Incite Hackathon 2025 talks/llms-at-scale: Training LLMs at Scale talks/llms-on-polaris: Training LLMs on Polaris talks/openskai25: Open SkAI2025 webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/input: Input webtui/components/popover: Popover webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### <style> tag webtui/installation/astro: ### Full Library Import webtui/installation/nextjs: Next.js webtui/installation/vite: Vite webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme posts/2025/06: 06 posts/auroragpt/aurora-gpt: 🏎️ Megatron-DeepSpeed on Intel XPU posts/auroragpt/determinstic-flash-attn/deterministic-flash-attn: 🎰 Deterministic `flash-attn` posts/auroragpt/flash-attn-sunspot: 📸 `flash-attn` on Sunspot posts/auroragpt/long-sequences: 🚂 Loooooooong Sequence Lengths posts/auroragpt/checkpoints: 💾 Converting Checkpoints posts/auroragpt/spike-skipper: 🏔️ Spike Skipper posts/auroragpt/mpi4py-reproducer: 🐛 `mpi4py` bug on Sunspot posts/auroragpt/startup-times: 🐢 Starting Up Distributed Training on Aurora posts/auroragpt/startup-times: ## Response posts/auroragpt/startup-times: ### Measuring / Calculating Startup Time posts/auroragpt/startup-times: ## Minimal Working Example posts/ai-for-physics/diffusion: 🎲 MCMC + Diffusion Sampling posts/ai-for-physics/l2hmc-qcd: 🎢 L2HMC for LQCD posts/jupyter/test: 🏁 `l2hmc` Example: 2D $U(1)$ talks/auroragpt/alcf-hpc-workshop-2024/auroragpt-alcf-hands-on-hpc-workshop-2024: AuroraGPT: ANL's General Purpose Scientific LLM posts/jupyter/l2hmc-4dsu3: 🔳 `l2hmc-qcd` Example: 4D SU(3) talks/incite-hackathon-2025/auroragpt: LLMs on Aurora: Overview talks/incite-hackathon-2025/ezpz: LLMs on Aurora: Hands-On talks/openskai25/ai4science: Scientific AI at Scale: AuroraGPT posts/2025/04/28: 🔥 Building PyTorch 2.6 from Source on Aurora talks/openskai25/training: Scientific AI at Scale: Distributed Training posts/2025/05/03: 🚧 Frameworks Issue with numpy \> 2 posts/2025/06/01: 📰 Nice Headings posts/2025/10/06: 🎨 Mixing Between Distributions While Training posts/2025/06/14: 🏗️ Building PyTorch 2.8 from Source on Aurora posts/2025/09/12: 🍹 BlendCorpus + TorchTitan @ ALCF posts/2025/11/12: 🧊 Cooling Down Checkpoints: Best Practices for Model Evaluation posts/2026/01/10: 🍋 ezpz: distributed PyTorch across any hardware posts/2025/06/02: 🧜‍♀️ Mermaid posts/2025/09/17: 📊 `pbs-tui`: TUI for PBS Job Scheduler Monitoring posts/2026/05/01: Running 50k Python Processes on Aurora with ezpz yeet posts/2026/05/01: ## What it does posts/2026/05/01: ## CLI surface posts/2026/05/01: ### Choosing a local copy method posts/2026/05/01: ### Tarball source posts/2026/05/01: ### Generic (non-venv) sources posts/2026/05/01: ## How it works posts/2026/05/01: ### Local copy + patch posts/2026/05/01: ### Greedy fan-out posts/2026/05/01: ## Scaling on Aurora: 8 → 4096 nodes posts/2026/05/01: ### Two regimes posts/2026/05/01: ### Why tarball broadcast scales so much better than per-file rsync posts/2026/05/01: ## Reproducing posts/2026/05/01: ## Complete workflow posts/2026/05/01: ## See also posts/2026/01/07: 🎉 Happy New Year! posts/2026/02/28: ⏱️ Comparing Launchers on Aurora posts/2026/02/28: ## torchrun posts/2026/02/28: ## ezpz posts/2026/04/27: Pre-Training AuroraGPT with TorchTitan posts/2026/04/27: ## Two-Week Summary (Apr 12–27, 2026) posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12–18 — Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12–15) posts/2026/04/27: #### LR Finder (Apr 12–14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12–18, syncs 6–14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18–25 — Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20–21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26–27 — Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26–27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1–3: Speedrun — 2N, GBS=48, 1000 steps posts/2026/04/27: ### 10B Full Training — 8N, GBS=384, ~3,178 steps posts/2026/04/27: ### Round 4: Reproducible Speedrun — 2N, GAS=8, GBS=384, 1000 steps posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/2026/04/27: ## High-Level posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12–18 — Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12–15) posts/2026/04/27: #### LR Finder (Apr 12–14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12–18, syncs 6–14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18–25 — Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20–21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26–27 — Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26–27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1–3: 1000-step speedruns, 2 nodes, GBS=48 (17 configs) posts/2026/04/27: ### Round 4 (10B full training, 8 nodes, GBS=384, 5 configs) posts/2026/04/27: ### Round 5 (2 nodes, GAS=8, GBS=384, local dataset, 8 configs — in progress) posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/ai-for-physics/l2hmc-qcd/2du1: 🎢 l2hmc-qcd Example: 2D U(1) posts/jupyter/l2hmc/4dsu3: 🔳 l2hmc-qcd Example: 4D SU(3) talks/2025/10/08: AERIS: Argonne's Earth Systems Model posts/ai-for-physics/l2hmc-qcd/4dsu3nb/index-broken: 🕸️ l2hmc-qcd Example: 4D SU(3) talks/2025/10/15: Training Foundation Models on Supercomputers talks/2025/09/24: Training Foundation Models on Supercomputers talks/2025/10/24: Training Foundation Models on Supercomputers talks/2026/06/03: Production Pre-Training at Scale: The Good, the Bad, and the Restarts talks/2025/12/16: AuroraGPT: Training Foundation Models on Supercomputers posts/drafts/2025/09/22: 📝 2025 Annual Report
 Theme Current: Light j/k or ↑/↓ + Enter

Running 50k Python Processes on Aurora with ezpz yeet

How ezpz yeet distributes Python environments to every worker node in an HPC job, and how it scales from 8 to 4096 nodes on Aurora.

On large HPC clusters, every Python import that touches the shared filesystem is a small tax — and at scale, a small tax paid by every rank turns into minutes of dead time before training even starts. ezpz yeet copies any directory or tarball to node-local /tmp/ storage on every worker in your job, so subsequent imports, checkpoint loads, and config reads hit local SSD instead of Lustre.

This post covers what yeet does, how it scales, and a 10-point benchmark sweep on Aurora from 8 to 4096 nodes.

Note: ezpz yeet-env was renamed to ezpz yeet. The old name still works as a deprecated alias.

What it does

Inside an interactive job allocation:

ezpz yeet                          # no args → syncs the active venv
ezpz yeet .venv.tar.gz             # positional shorthand for --src
ezpz yeet --src /path/to/dataset   # any directory or tarball

The default flow (no args):

  1. Detect the active Python environment via sys.prefix.
  2. Discover all nodes from the job’s hostfile (PBS or SLURM).
  3. Copy the environment to /tmp/<env-name>/ on the current node.
  4. Patch activate scripts, shebangs, and symlinks for the new /tmp/ location.
  5. Distribute the patched copy to every remote node via a greedy rsync fan-out.

For non-venv sources (datasets, model checkpoints, generic directories), step 4 is skipped and the trailing message changes to a generic “Synced to {dst}/ on N node(s)”.

After yeet, switch to the local copy and launch:

deactivate 2>/dev/null
source /tmp/.venv/bin/activate
cd /path/to/your/project        # shared FS for data/outputs
ezpz launch python3 -m your_app.train

/tmp/ is node-local — keep your working directory on a shared filesystem so all ranks can read inputs and write outputs.

CLI surface

ezpz yeet [SRC] [--src PATH] [--dst PATH] [--hostfile PATH]
          [--copy | --compress] [--dry-run]
Arg / FlagDefaultDescription
SRC (positional)Source path. Shorthand for --src. Mutually exclusive with --src
--srcactive venv / conda envSource path. May also be a .tar.gz / .tgz — see tarball source
--dst/tmp/<basename>/Destination on each node
--hostfileauto-detectHostfile for node list
--copyUse cp -a for the local copy (faster on Lustre)
--compresstar.gz → copy → extract (least Lustre metadata I/O)
--dry-runPreview without transferring

Choosing a local copy method

The default rsync is best for incremental updates (after a pip install, etc.) but slow for initial copies on Lustre because it stat()s every file. For the first transfer, prefer one of the faster methods:

MethodBest forHow it works
--copyfast initial copycp -a — sequential dir walk, no checksums
--compressslow Lustre / large envstar.gz → copy 1 file → extract locally
(default)incremental updatesrsync -rlD — only transfers changed files
# First time: compress for minimal Lustre I/O
ezpz yeet --compress

# Or: cp for simpler fast copy
ezpz yeet --copy

# After pip install: rsync only sends diffs
ezpz yeet

All three methods only affect the local Lustre → /tmp/ copy. Remote node distribution always uses rsync.

Tarball source

If you already have a .tar.gz (e.g. one built earlier with ezpz tar-env, or shipped with a project), pass it directly:

ezpz yeet --src /lus/.../my-env.tar.gz

This is similar to --compress but skips the create step — the tarball is copied to /tmp/ and extracted there:

  1. cp /lus/.../my-env.tar.gz /tmp/my-env.tar.gz
  2. tar -xzf /tmp/my-env.tar.gz --strip-components=1 -C /tmp/my-env/
  3. Patch shebangs / activate scripts (auto-detected from bin/activate)
  4. Delete the tarball
  5. Fan-out /tmp/my-env/ to all worker nodes via rsync

Both .tar.gz and .tgz are recognized. Default destination is /tmp/<basename-without-suffix>/.

Generic (non-venv) sources

yeet works on any directory:

# A pre-downloaded HF model checkpoint
ezpz yeet ~/models/Llama-3.1-8B

# A dataset shard
ezpz yeet --src /lus/datasets/imagenet-shard-0

When the source isn’t a venv (no bin/activate) and isn’t a conda env (no conda-meta/), the path-patching step is skipped.

How it works

graph TD A["ezpz yeet"] --> B["Detect source env"] B --> C["Discover nodes<br/>(PBS_NODEFILE / SLURM_NODELIST)"] C --> D["Copy to local /tmp/<br/>(rsync, cp -a, or tar.gz)"] D --> E["Patch paths + shebangs"] E --> F["Greedy rsync fan-out"] F --> G["Print activation instructions"]

Local copy + patch

yeet first copies the source to /tmp/<env>/ on the current node using rsync (default), cp -a (--copy), or tar.gz (--compress). If this fails, distribution is aborted immediately — no broken environment gets propagated.

After copying, the venv is patched once in place:

  • Replaces hardcoded VIRTUAL_ENV paths in bin/activate, bin/activate.csh, bin/activate.fish.
  • Re-links python3 symlinks to the system Python.
  • Updates pyvenv.cfg.
  • Rewrites shebangs in every entry-point script (ezpz, pip, torchrun, etc.) — pip bakes absolute paths into these at install time, so without this step they’d still point at the original Lustre location.

This patched copy in /tmp/ becomes the source for all subsequent rsyncs — no per-node patching, no SSH round-trips needed.

Greedy fan-out

A single source can only push to ~8 nodes at a time before the source NIC saturates, so yeet uses a greedy streaming fan-out: each node that finishes immediately becomes a source for the next available target.

A single thread pool manages all rsyncs. Each source is capped at MAX_PER_SOURCE=8 concurrent outbound rsyncs. As each rsync completes:

  • That node is registered as a new source.
  • New rsyncs are submitted using whichever source has the fewest active transfers (load-balanced).

The fan-out tree grows recursively — each newly-served node immediately becomes a source for up to 8 more:

graph TD subgraph "Local copy + patch" S["Source<br/>(shared filesystem)"] -->|"rsync / cp / tar.gz"| L["/tmp/ on node00"] end subgraph "Fan-out (each source serves up to MAX_PER_SOURCE = 8)" L --> A1["node01"] L --> A2["node02"] L --> N1["..."] L --> A8["node08"] A1 -->|"immediately<br/>becomes source"| B1["nodeA"] A1 --> B2["nodeB"] A2 --> B3["nodeC"] A2 --> B4["nodeD"] end subgraph "Each gen-2 node also fans out (up to 8)" B1 --> C1["nodeE"] B1 --> C2["nodeF"] B2 --> C3["nodeG"] B2 --> C4["nodeH"] end subgraph "...and so on, recursively, until every node has the env" C1 --> D1["nodeI"] C1 --> D2["nodeJ"] C2 --> D3["nodeK"] C2 --> D4["..."] end classDef src fill:#fab38730,stroke:#fab387,color:#fab387 classDef gen1 fill:#89b4fa30,stroke:#89b4fa,color:#89b4fa classDef gen2 fill:#a6e3a130,stroke:#a6e3a1,color:#a6e3a1 classDef gen3 fill:#cba6f730,stroke:#cba6f7,color:#cba6f7 classDef gen4 fill:#f5c2e730,stroke:#f5c2e7,color:#f5c2e7 classDef placeholder fill:#cccccc20,stroke:#cccccc88,color:#666666 class S,L src class A1,A2,A8 gen1 class B1,B2,B3,B4 gen2 class C1,C2,C3,C4 gen3 class D1,D2,D3,D4 gen4 class N1 placeholder

Faster nodes don’t wait for slower ones: if node01 finishes in 15 s but node08 takes 30 s, node01 is already serving new targets while node08 is still receiving. The result is approximately O(log N) wall-clock at moderate scale, until per-node contention starts to dominate (more on that below).

Scaling on Aurora: 8 → 4096 nodes

Full 10-point sweep using the tarball broadcast mode (ezpz yeet --src .venv.tar.gz) on Aurora, measured 2026-04-30 to 2026-05-01. The benchmark harness lives in saforem2/torchtitan@ezpz along with the raw CSV and plotting script.

Each job ran ezpz yeet --src .venv.tar.gz, then 10 training steps of agpt_2b to verify the broadcast venv was functional. first_step_seconds is the wall-clock from job start to the first training step — a useful proxy for total time-to-train, including import + initialization on top of the yeet itself.

Nodesyeet (s)First-step (s)Per-node (ms)
869.729.38,712
1689.731.65,606
3289.220.92,788
6491.234.61,425
128110.430.5862
256132.937.6519
512174.544.5341
1024255.460.8249
2048421.494.8206
4096750.6194.0183

Total wall-clock

Per-node amortized cost

Two regimes

  • 8–64 nodes is extract-bound. Total wall-clock is roughly flat at 70–91 s; per-node cost falls 8.7 s → 1.4 s as more nodes share the fixed-cost local extraction.
  • ≥128 nodes is broadcast-bound. Total wall-clock grows super-linearly. Each 2× in node count adds ~1.5–1.8× wall-clock (256→512: 1.31×, 512→1024: 1.46×, 1024→2048: 1.65×, 2048→4096: 1.78×) — the broadcast tree depth and per-leaf bandwidth contention both grow with scale.

Per-node amortized cost drops monotonically from 8.7 s/node at N=8 to 0.18 s/node at N=4096 — a 48× efficiency gain over the sweep. Even at the full-Aurora 4096-node scale, the pre-launch overhead is under 13 minutes.

First-step latency stays under a minute through 1024 nodes and only really starts to climb at 2048+ — consistent with init_process_group overhead growing with world size. At 4096 nodes the first step lands in 3 min 14 s, so total time-to-train (yeet + first step) is about 16 minutes.

Why tarball broadcast scales so much better than per-file rsync

The pre-tarball yeet mode (per-file rsync) was projected to take 1–2 hours at 256+ nodes — per-file metadata cost dominates over Lustre. Switching to a single compressed tarball (--compress or pre-built --src foo.tar.gz) reduces the Lustre side to one sequential read regardless of node count, so the broadcast itself is the only thing that scales with N.

Reproducing

The benchmark submits one PBS job per node count. PBS limits concurrent submissions per user, so chain via qsub -W depend or a polling wrapper:

# 8/16/32/64/128/256 → debug-scaling
for N in 8 16 32 64 128 256; do
    qsub -q debug-scaling -l select=$N -l walltime=00:30:00 \
        -N yeet-n$N \
        torchtitan/experiments/ezpz/scripts/yeet_env_scaling_test.sh
done

# 512/1024 → prod (auto-routes to small)
for N in 512 1024; do
    qsub -q prod -l select=$N -l walltime=01:00:00 \
        -N yeet-n$N \
        torchtitan/experiments/ezpz/scripts/yeet_env_scaling_test.sh
done

# 2048/4096 → prod (auto-routes to prod-large)
for N in 2048 4096; do
    qsub -q prod -l select=$N -l walltime=01:00:00 \
        -N yeet-n$N \
        torchtitan/experiments/ezpz/scripts/yeet_env_scaling_test.sh
done

# Plot once results have landed
python3 torchtitan/experiments/ezpz/docs/scaling/yeet_env/plot_yeet_env_scaling.py

Each job: 30-min walltime (1 h for ≥1024N), time yeet-env, then 10 training steps of agpt_2b to verify the broadcast venv works. Results land in .yeet-env-scaling-results.csv in the repo root.

Complete workflow

# 1. Get an interactive allocation
qsub -A <project> -q debug -l select=2 -l walltime=01:00:00 -I

# 2. Distribute the environment
ezpz yeet

# 3. Activate the local copy
deactivate 2>/dev/null
source /tmp/<env-name>/bin/activate

# 4. Launch from a shared filesystem path
cd /path/to/your/project
ezpz launch python3 -m your_app.train

See also

NORMAL  main  sam.onl/ posts/2026/05/01/index.mdx · Top 1:1