Training Foundation Models on Supercomputers
Sam Foreman 2025-10-24
- 🧑🏻💻 About Me
- Argonne Leadership Computing Facility (ALCF)
- 🌌 AuroraGPT (2024–)
- 🧬 MProt-DPO
- 🌎 AERIS (2025)
- 📓 References
- ❤️ Acknowledgements
- Extras
🧑🏻💻 About Me
- 🏡 samforeman.me
- UIUC (2015):
- Engineering Physics + Applied Mathematics
- University of Iowa (2015–2019):
- PhD. Physics1
- ANL (2019–2022): Postdoctoral Researcher
- ANL (2022–Present): Assistant Computational Scientist
- Member of the AI/ML Group at ALCF
Current Research:
- AuroraGPT: Foundation Models for Science
- AERIS: Argonne’s Earth System
Model
- Finalist for the 2025 ACM Gordon Bell Prize in Climate Modeling
- MProt-DPO:
Multimodal Protein Design
- Finalist for the ACM Gordon Bell Prize 2024
- GenSLMs: Genome Scale Language Models.
Argonne Leadership Computing Facility (ALCF)
The ALCF enables breakthroughs in science and engineering by providing supercomputing resources and expertise to the research community.
–alcf.anl.gov


🌀 Sequence-Window-Pipeline Parallelism SWiPe
SWiPeis a novel parallelism strategy for Swin-based Transformers- Hybrid 3D Parallelism strategy, combining:
- Sequence parallelism (
SP) - Window parallelism (
WP) - Pipeline parallelism (
PP)
- Sequence parallelism (
Figure 17
Figure 18: SWiPe Communication Patterns
🚀 AERIS: Scaling Results
Figure 19: AERIS: Scaling Results
- 10 EFLOPs (sustained) @ 120,960 GPUs
- See (Hatanpää et al. (2025)) for additional details
- arXiv:2509.13523
🌪️ Hurricane Laura

Figure 20: Hurricane Laura tracks (top) and intensity (bottom). Initialized 7(a), 5(b) and 3(c) days prior to 2020-08-28T00z.
📓 References
Dharuman, Gautham, Kyle Hippe, Alexander Brace, et al. 2024. “MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Atlanta, GA, USA), SC ’24. https://doi.org/10.1109/SC41406.2024.00013.
Hatanpää, Väinö, Eugene Ku, Jason Stock, et al. 2025. AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions. https://arxiv.org/abs/2509.13523.
Price, Ilan, Alvaro Sanchez-Gonzalez, Ferran Alet, et al. 2024. GenCast: Diffusion-Based Ensemble Forecasting for Medium-Range Weather. https://arxiv.org/abs/2312.15796.
Song, Shuaiwen Leon, Bonnie Kruft, Minjia Zhang, et al. 2023. DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery Through Sophisticated AI System Technologies. https://arxiv.org/abs/2310.04610.
❤️ Acknowledgements
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.