๐ ezpz: distributed PyTorch across any hardware
A history and overview of ezpz, with AMD and Intel PyTorch enablement timelines and why portable distributed training across GPU vendors is finally possible.
For most of PyTorchโs first decade, โrunning PyTorchโ effectively meant โrunning PyTorch on NVIDIAโ. Every distributed training script, every profiler, every example notebook assumed CUDA. If you wanted to run the same code on AMD or Intel hardware, you were either going to rewrite a launch script, port a kernel, or maintain a vendor-specific fork โ often all three.
That picture has changed faster than most people realize. In the last two years, PyTorch gained native Intel GPU support, AMD shipped day-zero ROCm builds for every PyTorch release, and Intelโs out-of-tree extension is now finishing its phased shutdown.1 You can write one PyTorch script today and run it across NVIDIA, AMD, and Intel hardware with no code changes โ if you handle the launch / environment / device-init differences.
That last โifโ is what ezpz exists to absorb. This post is
mostly about how the vendor landscape got here, and a little about what
that means for the launcher.
The two timelines
The clearest way to see the shift is side-by-side: AMDโs gradual ROCm-everywhere strategy, and Intelโs faster but later push to merge IPEX into upstream PyTorch.
Lining the AMD and Intel work up against the actual PyTorch release cadence is illuminating โ most of the integration milestones land on specific PyTorch versions:
Heads up: Intelโs separate IPEX project reaches end-of-life in March 2026 โ by then, native PyTorch is the only supported path on Intel GPUs.
AMD: a long, quiet build-up
AMDโs path to first-class PyTorch support is a 14-year project that mostly happened out of view. The pre-history goes back to the Torch7 era โ well before PyTorch existed in its current form โ and itโs not an accident that ROCm landed on Caffe and Torch7 first. AMD was building the porting story (HIP, HIPIFY, the C++ dialect, the toolchain) on the previous generation of frameworks before the new one became production-default.
That patience paid off in three big jumps:
- 2021 โ installable wheels. Before March 2021, you couldnโt just
pip install torchand get an AMD-compatible build. Once the ROCm Python packages went official, AMD became a one-line install on supported Linux systems โ the same UX as CUDA. PyTorch 1.8 was the first release with that working out of the box. - 2022 โ governance. AMD joined the PyTorch Foundation as a founding member when the project moved under the Linux Foundation. This was the point at which AMDโs integration stopped being โa vendor patchโ and started being a co-owned roadmap.
- 2023 โ day-zero. With PyTorch 2.0, AMD shipped ROCm 6.0 with same-day support, including TorchDynamo / TorchInductor on AMD hardware. This was the first release where you could pick up a fresh PyTorch and have AMD work immediately โ no lag, no porting window.
The rest of the timeline is filling in the corners: OpenAI Triton support arrived in 2023, MI300x guidance in mid-2024, native PyTorch on Windows for consumer Radeon cards in late 2025. The overall trajectory is clear: AMD is no longer playing catch-up on the framework. The remaining gaps are about specific kernels, FlashAttention variants, custom collectives โ work that lives in extensions, not in PyTorch itself.
Intel: a much faster, much later push
Intelโs story is compressed into a much shorter window โ basically four years vs AMDโs fourteen โ because Intel arrived after the framework had already standardized. Instead of a slow, parallel ROCm-style stack, Intel went the out-of-tree extension route first (IPEX, 2020) and only started the upstream merge in earnest with PyTorch 2.4 in 2024.
The integration cadence has been remarkably tight:
- 2.4 (Jul 2024) โ first prototype native Intel GPU support
- 2.5 (Oct 2024) โ solid native Intel GPU support landed
- 2.7 (Apr 2025) โ eager +
torch.compileparity on Intel GPUs - 2.8 (Aug 2025) โ XCCL collective backend; IPEX active development ceases
- 2.10 / Mar 2026 โ IPEX project reaches end-of-life
Notable to me: Intel chose to finish upstreaming before retiring the extension. The IPEX EOL date isnโt where the work stops โ itโs where the redundancy stops. The features have already moved.
What this means in practice
If youโre writing a new training script today (early 2026), the boilerplate problem has shifted. You used to spend most of the lifting on:
- Picking the right
torch.distributedbackend (nccl,gloo,xccl,rccl, โฆ). - Knowing which environment variables your launcher expects on this
particular cluster (
MASTER_ADDR,WORLD_SIZE,LOCAL_RANK,PALS_*,PMI_*,OMPI_*,SLURM_*โฆ). - Handling per-vendor device init quirks (
torch.cuda.set_devicevsxpu.set_devicevship.set_device). - Then, finally, the model code.
Steps 1โ3 are now almost the same across vendors. The collective
backends mostly map to the right thing automatically. The device
abstraction is unified under torch.accelerator (in 2.7+). Whatโs left
is mostly the launch boilerplate โ which is what ๐ ezpz
takes care of:
ezpz launchfigures out the launcher (mpiexec,srun,torchrun,deepspeed) from the environment.ezpz_setup_*shell helpers normalize the rank/size variables across PBS / SLURM / standalone.ezpz yeetdistributes your environment to every node so you donโt pay the Lustre-import tax โ covered in Running 50k Python Processes on Aurora.- The Python entry points stay vendor-agnostic; device init goes
through one helper that picks
cuda/xpu/hipbased on whatโs actually available.
The point isnโt that ezpz is doing anything magical โ itโs that the
framework finally caught up enough that a small, vendor-agnostic
launcher can exist at all. Five years ago, this post would have been
about writing per-vendor shims. Today itโs about deleting them.
Detailed timelines
For reference, the full chronology:
AMD
- Pre-2021 โ Torch7 era and CUDAโHIP ports. Torch7 was released in 2012 as a precursor to PyTorch (C++ + CUDA). With ROCm 1.0, AMD demonstrated CUDAโHIP conversion using HIPIFY, including ports of Caffe and Torch7.
- March 2021 โ PyTorch for AMD ROCm becomes officially available as a Python package on supported Linux systems.
- September 2022 โ PyTorch joins the Linux Foundation; AMD is a founding member of the PyTorch Foundation governing board.
- April 2023 โ AMD ships day-zero support for PyTorch 2.0 within the ROCm 6.0 ecosystem, including TorchDynamo/TorchInductor.
- 2023 โ OpenAI Triton support extended to AMD GPUs.
- June 2024 โ MI300x PyTorch guidance published, with near drop-in compatibility for code written for NVIDIA GPUs.
- September 2025 โ Public preview of PyTorch on Windows for select consumer Radeon RX 7000/9000 series GPUs and Ryzen AI APUs (no WSL2 needed).
- October 2024 โ How-to guide for Torchtune (PyTorch LLM fine-tuning library) on AMD GPUs.
- November 2025 โ AMD Software: PyTorch on Windows Edition 7.1.1 with ROCm 7.1.1.
- 2026 / post-2026 โ MI450X rack-scale solution targeting NVIDIA high-end parity in H2 2026; MI500 series in development.
Intel
- 2018 โ Intel begins contributing to upstream PyTorch.
- 2020 โ Intel Extension for PyTorch (IPEX) launches as a separate package for Intel CPUs and GPUs.
- October 20222 โ PyTorch 1.13 ships with integrated Intel VTune ITT API support.
- August 20233 โ Intel joins the PyTorch Foundation as a Premier member.
- July 2024 โ PyTorch 2.4 with prototype native Intel GPU support (client + data center).
- April 2025 โ PyTorch 2.7 establishes solid Intel GPU support in
both eager and graph modes (
torch.compile) on Windows and Linux. - August 2025 โ IPEX active development ceases following the PyTorch 2.8 release; most features are upstreamed.
- End of March 2026 (planned) โ IPEX reaches end-of-life. Use native PyTorch directly.
Footnotes
-
Even now, in 2026, plenty of code is still NVIDIA-centric and is rarely designed with multi-platform support in mind โ but the framework no longer is. โฉ