 Command

Sam Foreman's personal site. Vim-style keybinds for navigation; theme + font pickers below.

Theme
 Font
Keybinds
Navigation
j / โ†“ Next item k / โ†‘ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block ⌃D / ⌃U Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar ⇧C / ⇧E Collapse / expand all sections
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
⌃N Next document ⌃B Previous document ⌃O History back ⌃I History forward
 Search
about: Sam Foreman docs/test: Docs Test ideas: ๐Ÿ’ก Ideas about/more: ๐Ÿชช More now: Now more: โž• More posts: ๐Ÿ“ฌ Posts projects: ๐Ÿ“š Projects talks: ๐ŸŽ™๏ธ Talks webtui: Style posts/2025: ๐Ÿ“† 2025 posts/auroragpt: ๐Ÿค– AuroraGPT posts/ai-for-physics: โš›๏ธ AI for Physics posts/dope-slides: ๐Ÿ’… How to Make Dope Slides posts/ezpz-at-alcf: ๐Ÿ‹ ezpz @ ALCF posts/ezpz-v1: ๐Ÿ“ ezpz-v1 posts/jupyter: ๐Ÿ“— Jupyter posts/resume: ๐Ÿง‘๐Ÿปโ€๐Ÿ’ป Sam Foremanโ€™s Rรฉsumรฉ posts/svgbob: ๐Ÿซฅ svgbob posts/torchtune-aurora: ๐Ÿช› Torchtune on Aurora posts/torchtune-patch-aurora: ๐Ÿš‘ Torchtune Patch on Aurora talks/auroragpt-siam25: AuroraGPT talks/ai-for-science-2024: Parallel Training Methods talks/aurora-gpt-fm-for-electric-grid/auroragpt-fm-for-electric-grid: AuroraGPT: Foundation Models for Science talks/hpc-user-forum/auroragpt: AuroraGPT talks/alcf-hpc-workshop-2024/alcf-hpc-workshop-2024: Deep Learning and Foundation Models at Scale talks/demo-slides: AuroraGPT: Training Foundation Models on Supercomputers talks/incite-hackathon-2025: ALCF Incite Hackathon 2025 talks/llms-at-scale: Training LLMs at Scale talks/llms-on-polaris: Training LLMs on Polaris talks/openskai25: Open SkAI2025 webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/input: Input webtui/components/popover: Popover webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### <style> tag webtui/installation/astro: ### Full Library Import webtui/installation/nextjs: Next.js webtui/installation/vite: Vite webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme posts/2025/06: 06 posts/auroragpt/aurora-gpt: ๐ŸŽ๏ธ Megatron-DeepSpeed on Intel XPU posts/auroragpt/determinstic-flash-attn/deterministic-flash-attn: ๐ŸŽฐ Deterministic `flash-attn` posts/auroragpt/flash-attn-sunspot: ๐Ÿ“ธ `flash-attn` on Sunspot posts/auroragpt/long-sequences: ๐Ÿš‚ Loooooooong Sequence Lengths posts/auroragpt/checkpoints: ๐Ÿ’พ Converting Checkpoints posts/auroragpt/spike-skipper: ๐Ÿ”๏ธ Spike Skipper posts/auroragpt/mpi4py-reproducer: ๐Ÿ› `mpi4py` bug on Sunspot posts/auroragpt/startup-times: ๐Ÿข Starting Up Distributed Training on Aurora posts/auroragpt/startup-times: ## Response posts/auroragpt/startup-times: ### Measuring / Calculating Startup Time posts/auroragpt/startup-times: ## Minimal Working Example posts/ai-for-physics/diffusion: ๐ŸŽฒ MCMC + Diffusion Sampling posts/ai-for-physics/l2hmc-qcd: ๐ŸŽข L2HMC for LQCD posts/jupyter/test: ๐Ÿ `l2hmc` Example: 2D $U(1)$ talks/auroragpt/alcf-hpc-workshop-2024/auroragpt-alcf-hands-on-hpc-workshop-2024: AuroraGPT: ANL's General Purpose Scientific LLM posts/jupyter/l2hmc-4dsu3: ๐Ÿ”ณ `l2hmc-qcd` Example: 4D SU(3) talks/incite-hackathon-2025/auroragpt: LLMs on Aurora: Overview talks/incite-hackathon-2025/ezpz: LLMs on Aurora: Hands-On talks/openskai25/ai4science: Scientific AI at Scale: AuroraGPT posts/2025/04/28: ๐Ÿ”ฅ Building PyTorch 2.6 from Source on Aurora talks/openskai25/training: Scientific AI at Scale: Distributed Training posts/2025/05/03: ๐Ÿšง Frameworks Issue with numpy \> 2 posts/2025/06/01: ๐Ÿ“ฐ Nice Headings posts/2025/10/06: ๐ŸŽจ Mixing Between Distributions While Training posts/2025/06/14: ๐Ÿ—๏ธ Building PyTorch 2.8 from Source on Aurora posts/2025/09/12: ๐Ÿน BlendCorpus + TorchTitan @ ALCF posts/2025/11/12: ๐ŸงŠ Cooling Down Checkpoints: Best Practices for Model Evaluation posts/2026/01/10: ๐Ÿ‹ ezpz: distributed PyTorch across any hardware posts/2025/06/02: ๐Ÿงœโ€โ™€๏ธ Mermaid posts/2025/09/17: ๐Ÿ“Š `pbs-tui`: TUI for PBS Job Scheduler Monitoring posts/2026/05/01: Running 50k Python Processes on Aurora with ezpz yeet posts/2026/05/01: ## What it does posts/2026/05/01: ## CLI surface posts/2026/05/01: ### Choosing a local copy method posts/2026/05/01: ### Tarball source posts/2026/05/01: ### Generic (non-venv) sources posts/2026/05/01: ## How it works posts/2026/05/01: ### Local copy + patch posts/2026/05/01: ### Greedy fan-out posts/2026/05/01: ## Scaling on Aurora: 8 โ†’ 4096 nodes posts/2026/05/01: ### Two regimes posts/2026/05/01: ### Why tarball broadcast scales so much better than per-file rsync posts/2026/05/01: ## Reproducing posts/2026/05/01: ## Complete workflow posts/2026/05/01: ## See also posts/2026/01/07: ๐ŸŽ‰ Happy New Year! posts/2026/02/28: โฑ๏ธ Comparing Launchers on Aurora posts/2026/02/28: ## torchrun posts/2026/02/28: ## ezpz posts/2026/04/27: Pre-Training AuroraGPT with TorchTitan posts/2026/04/27: ## Two-Week Summary (Apr 12โ€“27, 2026) posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12โ€“18 โ€” Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12โ€“15) posts/2026/04/27: #### LR Finder (Apr 12โ€“14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12โ€“18, syncs 6โ€“14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18โ€“25 โ€” Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20โ€“21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26โ€“27 โ€” Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26โ€“27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1โ€“3: Speedrun โ€” 2N, GBS=48, 1000 steps posts/2026/04/27: ### 10B Full Training โ€” 8N, GBS=384, ~3,178 steps posts/2026/04/27: ### Round 4: Reproducible Speedrun โ€” 2N, GAS=8, GBS=384, 1000 steps posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/2026/04/27: ## High-Level posts/2026/04/27: ## Detailed Breakdown posts/2026/04/27: ### Week 1: Apr 12โ€“18 โ€” Benchmarking, LR Finder, XPU Fixes posts/2026/04/27: #### Benchmarking (Apr 12โ€“15) posts/2026/04/27: #### LR Finder (Apr 12โ€“14) posts/2026/04/27: #### Scaling Study (Apr 12) posts/2026/04/27: #### Upstream Syncs (Apr 12โ€“18, syncs 6โ€“14) posts/2026/04/27: #### XPU Bug Fixes (Apr 18) posts/2026/04/27: #### RL Experiment (Apr 18) posts/2026/04/27: ### Week 1.5: Apr 18โ€“25 โ€” Production Readiness posts/2026/04/27: #### Torch 2.12 Benchmarks (Apr 18) posts/2026/04/27: #### LR Finder Extensions (Apr 20โ€“21) posts/2026/04/27: #### XPU Fixes (Apr 23) posts/2026/04/27: #### Torch 2.13 Environment (Apr 25) posts/2026/04/27: #### 2B Scaling Study on Torch 2.13 (Apr 25) posts/2026/04/27: #### Production Training (Apr 25) posts/2026/04/27: ### Week 2: Apr 26โ€“27 โ€” Optimizer Competition posts/2026/04/27: #### RL Multi-Task Refactor (Apr 26) posts/2026/04/27: #### Docs Reorganization (Apr 26) posts/2026/04/27: #### Generic HF Dataset Streaming (Apr 26) posts/2026/04/27: #### New Optimizers (Apr 26) posts/2026/04/27: #### Architecture Tweaks (Apr 26โ€“27) posts/2026/04/27: ## Competition Results posts/2026/04/27: ### Round 1โ€“3: 1000-step speedruns, 2 nodes, GBS=48 (17 configs) posts/2026/04/27: ### Round 4 (10B full training, 8 nodes, GBS=384, 5 configs) posts/2026/04/27: ### Round 5 (2 nodes, GAS=8, GBS=384, local dataset, 8 configs โ€” in progress) posts/2026/04/27: ## Key Discoveries posts/2026/04/27: ## Infrastructure Built posts/ai-for-physics/l2hmc-qcd/2du1: ๐ŸŽข l2hmc-qcd Example: 2D U(1) posts/jupyter/l2hmc/4dsu3: ๐Ÿ”ณ l2hmc-qcd Example: 4D SU(3) talks/2025/10/08: AERIS: Argonne's Earth Systems Model posts/ai-for-physics/l2hmc-qcd/4dsu3nb/index-broken: ๐Ÿ•ธ๏ธ l2hmc-qcd Example: 4D SU(3) talks/2025/10/15: Training Foundation Models on Supercomputers talks/2025/09/24: Training Foundation Models on Supercomputers talks/2025/10/24: Training Foundation Models on Supercomputers talks/2026/06/03: Production Pre-Training at Scale: The Good, the Bad, and the Restarts talks/2025/12/16: AuroraGPT: Training Foundation Models on Supercomputers posts/drafts/2025/09/22: ๐Ÿ“ 2025 Annual Report
 Theme Current: Light j/k or โ†‘/โ†“ + Enter

๐Ÿ‹ ezpz: distributed PyTorch across any hardware

A history and overview of ezpz, with AMD and Intel PyTorch enablement timelines and why portable distributed training across GPU vendors is finally possible.

For most of PyTorchโ€™s first decade, โ€œrunning PyTorchโ€ effectively meant โ€œrunning PyTorch on NVIDIAโ€. Every distributed training script, every profiler, every example notebook assumed CUDA. If you wanted to run the same code on AMD or Intel hardware, you were either going to rewrite a launch script, port a kernel, or maintain a vendor-specific fork โ€” often all three.

That picture has changed faster than most people realize. In the last two years, PyTorch gained native Intel GPU support, AMD shipped day-zero ROCm builds for every PyTorch release, and Intelโ€™s out-of-tree extension is now finishing its phased shutdown.1 You can write one PyTorch script today and run it across NVIDIA, AMD, and Intel hardware with no code changes โ€” if you handle the launch / environment / device-init differences.

That last โ€œifโ€ is what ezpz exists to absorb. This post is mostly about how the vendor landscape got here, and a little about what that means for the launcher.

The two timelines

The clearest way to see the shift is side-by-side: AMDโ€™s gradual ROCm-everywhere strategy, and Intelโ€™s faster but later push to merge IPEX into upstream PyTorch.

%%{init: {'themeCSS': '.titleText{color:var(--foreground1)!important;fill:var(--foreground1)!important;font-size:0.95rem!important;font-weight:700;}.taskText{font-weight:600;font-size:0.74rem!important;}.taskText,.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle,.tick text{fill:var(--foreground0)!important;}.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle{font-size:0.74rem!important;}.tick text{font-size:0.7rem!important;}.taskTextOutsideRight{text-anchor:start;transform:translateX(0.45ch);}.taskTextOutsideLeft{text-anchor:end;transform:translateX(-0.45ch);}.todayMarker{stroke:var(--red)!important;stroke-width:0.12rem;opacity:0.9;}.grid .tick line{stroke:var(--background3)!important;opacity:0.6;}.section0{fill:color-mix(in oklch,var(--background1) 72%,transparent)!important;}.section1{fill:color-mix(in oklch,var(--blue) 38%,transparent)!important;}.active,.done{fill:color-mix(in srgb,var(--blue) 72%,white 28%)!important;}.crit,.milestone{fill:var(--red)!important;stroke:var(--red)!important;}'}}%% gantt title AMD and Intel PyTorch Enablement Timeline dateFormat YYYY axisFormat %Y section AMD ROCm and PyTorch Torch7 era and early CUDA to HIP ports :amd1, 2012, 2016 ROCm 1.0 and HIPIFY tooling :amd2, 2016, 2020 Official PyTorch ROCm Python packages :amd3, 2021, 2022 PyTorch Foundation governance participation :amd4, 2022, 2023 Triton ecosystem support :amd6, 2023, 2024 MI300x PyTorch guidance :amd7, 2024, 2024 section Intel and PyTorch Initial PyTorch contributions :i2, 2018, 2019 Intel Extension for PyTorch launch :i3, 2020, 2024 VTune ITT API integration in PyTorch :milestone, i4, 2022, 1d PyTorch Foundation Premier membership :milestone, i5, 2023, 1d Prototype native Intel GPU support :milestone, i6, 2024, 1d Solid native Intel GPU support :milestone, i7, 2025, 1d IPEX feature upstreaming completion :milestone, i8, 2025, 1d Intel Extension for PyTorch end of life :milestone, crit, i9, 2026, 1d

Lining the AMD and Intel work up against the actual PyTorch release cadence is illuminating โ€” most of the integration milestones land on specific PyTorch versions:

%%{init: {'themeCSS': '.titleText{color:var(--foreground1)!important;fill:var(--foreground1)!important;font-size:0.95rem!important;font-weight:700;}.taskText{font-weight:600;font-size:0.74rem!important;}.taskText,.taskTextOutsideLeft,.taskTextOutsideRight,.sectionTitle,.tick text{fill:var(--foreground0)!important;}.taskTextOutsideLeft,.sectionTitle{font-size:0.74rem!important;}.taskTextOutsideRight{font-size:0.66rem!important;text-anchor:start;transform:translateX(0.2ch);}.tick text{font-size:0.7rem!important;}.taskTextOutsideLeft{text-anchor:end;transform:translateX(-0.45ch);}.todayMarker{stroke:var(--red)!important;stroke-width:0.12rem;opacity:0.9;}.grid .tick line{stroke:var(--background3)!important;opacity:0.6;}.section0{fill:color-mix(in oklch,var(--orange) 30%,transparent)!important;}.section1{fill:color-mix(in oklch,var(--background2) 76%,transparent)!important;}.section2{fill:color-mix(in oklch,var(--blue) 42%,transparent)!important;}.active,.done{fill:color-mix(in srgb,var(--blue) 72%,white 28%)!important;}.crit,.milestone{fill:var(--red)!important;stroke:var(--red)!important;}'}}%% gantt title PyTorch Vendor Integration Timeline AMD vs Intel dateFormat YYYY-MM-DD axisFormat %Y section AMD Installable PyTorch ROCm Python packages :amd2, 2021-03-04, 1d ROCm marked stable :amd3, 2022-06-28, 1d section PyTorch Releases 1.8 :milestone, crit, pt180, 2021-03-04, 1d 1.12 :pt1120, 2022-06-28, 1d 2.0 :milestone, crit, pt200, 2023-03-15, 1d 2.4 :pt24, 2024-07-24, 1d 2.5 :milestone, crit, pt250, 2024-10-17, 1d 2.6 :pt260, 2025-01-29, 1d 2.7 :pt270, 2025-04-23, 1d 2.8 :crit, pt280, 2025-08-06, 1d 2.9 :pt290, 2025-10-15, 1d 2.10 :pt210, 2026-01-15, 1d section Intel Intel GPU improvements begin :int2, 2024-07-24, 1d Native Intel GPU support in 2.5 :int3, 2024-10-17, 1d Intel GPU eager/compile parity in 2.7 :int4, 2025-04-23, 1d Intel XCCL backend in 2.8 :int5, 2025-04-23, 1d IPEX discontinued :int6, 2025-08-06, 2026-03-31 IPEX end of life :milestone, crit, int7, 2026-03-31, 1d

Heads up: Intelโ€™s separate IPEX project reaches end-of-life in March 2026 โ€” by then, native PyTorch is the only supported path on Intel GPUs.

AMD: a long, quiet build-up

AMDโ€™s path to first-class PyTorch support is a 14-year project that mostly happened out of view. The pre-history goes back to the Torch7 era โ€” well before PyTorch existed in its current form โ€” and itโ€™s not an accident that ROCm landed on Caffe and Torch7 first. AMD was building the porting story (HIP, HIPIFY, the C++ dialect, the toolchain) on the previous generation of frameworks before the new one became production-default.

That patience paid off in three big jumps:

  • 2021 โ€” installable wheels. Before March 2021, you couldnโ€™t just pip install torch and get an AMD-compatible build. Once the ROCm Python packages went official, AMD became a one-line install on supported Linux systems โ€” the same UX as CUDA. PyTorch 1.8 was the first release with that working out of the box.
  • 2022 โ€” governance. AMD joined the PyTorch Foundation as a founding member when the project moved under the Linux Foundation. This was the point at which AMDโ€™s integration stopped being โ€œa vendor patchโ€ and started being a co-owned roadmap.
  • 2023 โ€” day-zero. With PyTorch 2.0, AMD shipped ROCm 6.0 with same-day support, including TorchDynamo / TorchInductor on AMD hardware. This was the first release where you could pick up a fresh PyTorch and have AMD work immediately โ€” no lag, no porting window.

The rest of the timeline is filling in the corners: OpenAI Triton support arrived in 2023, MI300x guidance in mid-2024, native PyTorch on Windows for consumer Radeon cards in late 2025. The overall trajectory is clear: AMD is no longer playing catch-up on the framework. The remaining gaps are about specific kernels, FlashAttention variants, custom collectives โ€” work that lives in extensions, not in PyTorch itself.

Intel: a much faster, much later push

Intelโ€™s story is compressed into a much shorter window โ€” basically four years vs AMDโ€™s fourteen โ€” because Intel arrived after the framework had already standardized. Instead of a slow, parallel ROCm-style stack, Intel went the out-of-tree extension route first (IPEX, 2020) and only started the upstream merge in earnest with PyTorch 2.4 in 2024.

The integration cadence has been remarkably tight:

  • 2.4 (Jul 2024) โ€” first prototype native Intel GPU support
  • 2.5 (Oct 2024) โ€” solid native Intel GPU support landed
  • 2.7 (Apr 2025) โ€” eager + torch.compile parity on Intel GPUs
  • 2.8 (Aug 2025) โ€” XCCL collective backend; IPEX active development ceases
  • 2.10 / Mar 2026 โ€” IPEX project reaches end-of-life

Notable to me: Intel chose to finish upstreaming before retiring the extension. The IPEX EOL date isnโ€™t where the work stops โ€” itโ€™s where the redundancy stops. The features have already moved.

What this means in practice

If youโ€™re writing a new training script today (early 2026), the boilerplate problem has shifted. You used to spend most of the lifting on:

  1. Picking the right torch.distributed backend (nccl, gloo, xccl, rccl, โ€ฆ).
  2. Knowing which environment variables your launcher expects on this particular cluster (MASTER_ADDR, WORLD_SIZE, LOCAL_RANK, PALS_*, PMI_*, OMPI_*, SLURM_*โ€ฆ).
  3. Handling per-vendor device init quirks (torch.cuda.set_device vs xpu.set_device vs hip.set_device).
  4. Then, finally, the model code.

Steps 1โ€“3 are now almost the same across vendors. The collective backends mostly map to the right thing automatically. The device abstraction is unified under torch.accelerator (in 2.7+). Whatโ€™s left is mostly the launch boilerplate โ€” which is what ๐Ÿ‹ ezpz takes care of:

  • ezpz launch figures out the launcher (mpiexec, srun, torchrun, deepspeed) from the environment.
  • ezpz_setup_* shell helpers normalize the rank/size variables across PBS / SLURM / standalone.
  • ezpz yeet distributes your environment to every node so you donโ€™t pay the Lustre-import tax โ€” covered in Running 50k Python Processes on Aurora.
  • The Python entry points stay vendor-agnostic; device init goes through one helper that picks cuda / xpu / hip based on whatโ€™s actually available.

The point isnโ€™t that ezpz is doing anything magical โ€” itโ€™s that the framework finally caught up enough that a small, vendor-agnostic launcher can exist at all. Five years ago, this post would have been about writing per-vendor shims. Today itโ€™s about deleting them.

Detailed timelines

For reference, the full chronology:

AMD

  • Pre-2021 โ€” Torch7 era and CUDAโ†’HIP ports. Torch7 was released in 2012 as a precursor to PyTorch (C++ + CUDA). With ROCm 1.0, AMD demonstrated CUDAโ†’HIP conversion using HIPIFY, including ports of Caffe and Torch7.
  • March 2021 โ€” PyTorch for AMD ROCm becomes officially available as a Python package on supported Linux systems.
  • September 2022 โ€” PyTorch joins the Linux Foundation; AMD is a founding member of the PyTorch Foundation governing board.
  • April 2023 โ€” AMD ships day-zero support for PyTorch 2.0 within the ROCm 6.0 ecosystem, including TorchDynamo/TorchInductor.
  • 2023 โ€” OpenAI Triton support extended to AMD GPUs.
  • June 2024 โ€” MI300x PyTorch guidance published, with near drop-in compatibility for code written for NVIDIA GPUs.
  • September 2025 โ€” Public preview of PyTorch on Windows for select consumer Radeon RX 7000/9000 series GPUs and Ryzen AI APUs (no WSL2 needed).
  • October 2024 โ€” How-to guide for Torchtune (PyTorch LLM fine-tuning library) on AMD GPUs.
  • November 2025 โ€” AMD Software: PyTorch on Windows Edition 7.1.1 with ROCm 7.1.1.
  • 2026 / post-2026 โ€” MI450X rack-scale solution targeting NVIDIA high-end parity in H2 2026; MI500 series in development.

Intel

  • 2018 โ€” Intel begins contributing to upstream PyTorch.
  • 2020 โ€” Intel Extension for PyTorch (IPEX) launches as a separate package for Intel CPUs and GPUs.
  • October 20222 โ€” PyTorch 1.13 ships with integrated Intel VTune ITT API support.
  • August 20233 โ€” Intel joins the PyTorch Foundation as a Premier member.
  • July 2024 โ€” PyTorch 2.4 with prototype native Intel GPU support (client + data center).
  • April 2025 โ€” PyTorch 2.7 establishes solid Intel GPU support in both eager and graph modes (torch.compile) on Windows and Linux.
  • August 2025 โ€” IPEX active development ceases following the PyTorch 2.8 release; most features are upstreamed.
  • End of March 2026 (planned) โ€” IPEX reaches end-of-life. Use native PyTorch directly.

Footnotes

  1. Even now, in 2026, plenty of code is still NVIDIA-centric and is rarely designed with multi-platform support in mind โ€” but the framework no longer is. โ†ฉ

  2. PyTorch 1.13 release โ†ฉ

  3. Intel Joins the PyTorch Foundation โ†ฉ

NORMAL  main  sam.onl/ posts/2026/01/10/index.mdx ยท Top 1:1