📝 2025 Annual Report
Sam Foreman 2025-09-22
- Goals for Next Year (2026)
- Goals from Last Year (2024)
- Contributions to ALCF
- Publications
- Presentations
- Posts
- Organizational Efforts
- Mentoring
- Scientific / Technical Accomplishments
- References
Goals for Next Year (2026)
- Build out generic training services for science teams
- Continue to push on resilient / fault-tolerant training techniques
Goals from Last Year (2024)
- Continue to contribute to division(/lab)-wide efforts
- Continue to work with application teams to efficiently scale on ALCF systems
- [WIP] Publish retrospective on initial pre-training of AuroraGPT
Contributions to ALCF
-
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions
- ACM Gordon Bell Prize Finalist (co-author)
- Contributed to model development, performance analysis, and scaling studies
-
MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design with DPO
- Finalist for the 2024 ACM Gordon Bell Prize (first-author)
-
AuroraGPT
- Co-lead Models and Training team with Venkat Vishwanath
- Ongoing writeup of pre-training efforts
- Successfully pre-trained:
- AuroraGPT-7B on 2T tokens
- AuroraGPT-2B on 4T tokens (ongoing)
-
Catalyst for:
- Arvind Ramanthan’s INCITE Project (
FoundEpidem) - Zheng Zhang’s ALCC Project
- Rao Kotamarthi’s ALCC Project
- Arvind Ramanthan’s INCITE Project (
-
Member of Software Committee
-
Intro to HPC Undergraduate Bootcamp:
- Project lead for Intro to {AI, HPC} for Science
Publications
- AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions (Hatanpää et al. (2025))1
- Aurora: Architecting Argonne’s First Exascale Supercomputer for Accelerated Scientific Discovery (Allen et al. (2025))
- HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights (Gokdemir et al. (2025))
- Automated Tuning for HMC Mass Ratios (Torsiello et al. (2025))
- MOFA: Discovering Materials for Carbon Capture with a GenAI and Simulation-Based Workflow (Yan et al. (2025))
- MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design with DPO (Dharuman et al. (2024))2
Presentations
- Scientific AI at Scale: AI for Science @ Open SkAI 2025
- Scientific AI at Scale: Distributed Training @ Open SkAI 2025
- Large Scale Training on Diverse Accelerators @ Scalable Deep Learning, SIAM AN2025
- LLMs on Aurora: 🌌 AuroraGPT @ 2025 ALCF INCITE GPU Hackathon
- LLMs on Aurora: 🍋 ezpz @ 2025 ALCF INCITE GPU Hackathon
- AuroraGPT: Foundation Models for Science @ Foundation Models for the Electric Grid
- Parallel Training Methods @ AI-for-Science on Supercomputers
- AuroraGPT @ 2024 ALCF Hands-On HPC Workshop
- Machine Learning and Foundation Models at Scale @ 2024 ALCF Hands-On HPC Workshop
Posts
- 📊 pbs-tui : TUI for PBS Job Scheduler Monitoring
- 🍹 BlendCorpus + TorchTitan @ ALCF
- 🏗️ Building PyTorch 2.8 from Source on Aurora
- 🚧 Frameworks Issue with numpy > 2
- 🔥 Building PyTorch 2.6 from Source on Aurora
- 🪛 Torchtune on Aurora
- 🚑 Torchtune Patch on Aurora
- 💾 Converting Checkpoints
Organizational Efforts
- Organizer for:
- Served as reviewer for:
- HiPC 2025
- SPIGM @ NeurIPS
- ML4PS Workshop @ NeurIPS’24
- AI4Science Workshop @ NeurIPS’24
- GenBio Workshop @ NeurIPS’24
- AI4Science Workshop @ ICML’24
Mentoring
- Khalid Hossain: Supported Khalid’s successful transition from postdoc to staff
- Joseph Frimpong: Postdoc in Center for Nanoscale Materials
- Hung Nguyen: Graduate student @ UIUC
Scientific / Technical Accomplishments
References
Allen, Benjamin S., James Anchell, Victor Anisimov, et al. 2025. Aurora: Architecting Argonne’s First Exascale Supercomputer for Accelerated Scientific Discovery. https://arxiv.org/abs/2509.08207.
Dharuman, Gautham, Kyle Hippe, Alexander Brace, et al. 2024. “MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Atlanta, GA, USA), SC ’24. https://doi.org/10.1109/SC41406.2024.00013.
Gokdemir, Ozan, Carlo Siebenschuh, Alexander Brace, et al. 2025. HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights. https://arxiv.org/abs/2505.04846.
Hatanpää, Väinö, Eugene Ku, Jason Stock, et al. 2025. AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions. https://arxiv.org/abs/2509.13523.
Torsiello, J., G. T. Fleming, S. Foreman, X.-Y. Jin, and J. C. Osborn. 2025. “Automated Tuning for HMC Mass Ratios.” In PoS. Argonne, ALCF; Argonne National Laboratory (ANL), Argonne, IL (United States); Temple U.; Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States). https://doi.org/10.22323/1.466.0052.
Yan, Xiaoli, Nathaniel Hudson, Hyun Park, et al. 2025. MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow. https://arxiv.org/abs/2501.10651.