 Command
Site Info

sam.onl is a terminal-style knowledge base and notes hub, built with Astro + WebTUI. Use the keybinds below to move between navbar, content, and sidebars, then customize the look with the theme picker.

Theme
Keybinds
Navigation
j / ↓ Next item k / ↑ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block ⌃D / ⌃U Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
⌃N Next document ⌃B Previous document ⌃O History back ⌃I History forward
 Search
landing: Sam Foreman about: 🪪 About docs/test: Docs Test ideas: 💡 Ideas now: Now more: ➕ More projects: 📚 Projects posts: 📬 Posts talks: 🎙️ Talks posts/2025: 📆 2025 posts/auroragpt: 🤖 AuroraGPT posts/ai-for-physics: ⚛️ AI for Physics posts/dope-slides: 💅 How to Make Dope Slides posts/ezpz-at-alcf: 🍋 ezpz @ ALCF posts/jupyter: 📗 Jupyter posts/resume: 🧑🏻‍💻 Sam Foreman’s Résumé posts/ezpz-v1: 📝 ezpz-v1 posts/torchtune-aurora: 🪛 Torchtune on Aurora posts/torchtune-patch-aurora: 🚑 Torchtune Patch on Aurora posts/svgbob: 🫥 svgbob talks/auroragpt-siam25: AuroraGPT talks/ai-for-science-2024: Parallel Training Methods talks/alcf-hpc-workshop-2024/alcf-hpc-workshop-2024: Deep Learning and Foundation Models at Scale talks/aurora-gpt-fm-for-electric-grid/auroragpt-fm-for-electric-grid: AuroraGPT: Foundation Models for Science talks/hpc-user-forum/auroragpt: AuroraGPT talks/incite-hackathon-2025: ALCF Incite Hackathon 2025 talks/llms-at-scale: Training LLMs at Scale talks/llms-on-polaris: Training LLMs on Polaris talks/openskai25: Open SkAI2025 webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/input: Input webtui/components/popover: Popover webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### <style> tag webtui/installation/astro: ### Full Library Import webtui/installation/nextjs: Next.js webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/installation/vite: Vite webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells posts/2025/06: 06 posts/auroragpt/aurora-gpt: 🏎️ Megatron-DeepSpeed on Intel XPU posts/auroragpt/checkpoints: 💾 Converting Checkpoints posts/auroragpt/flash-attn-sunspot: 📸 `flash-attn` on Sunspot posts/auroragpt/determinstic-flash-attn/deterministic-flash-attn: 🎰 Deterministic `flash-attn` posts/auroragpt/long-sequences: 🚂 Loooooooong Sequence Lengths posts/auroragpt/mpi4py-reproducer: 🐛 `mpi4py` bug on Sunspot posts/auroragpt/spike-skipper: 🏔️ Spike Skipper posts/auroragpt/startup-times: 🐢 Starting Up Distributed Training on Aurora posts/auroragpt/startup-times: ## Response posts/auroragpt/startup-times: ### Measuring / Calculating Startup Time posts/auroragpt/startup-times: ## Minimal Working Example posts/ai-for-physics/diffusion: 🎲 MCMC + Diffusion Sampling posts/ai-for-physics/l2hmc-qcd: 🎢 L2HMC for LQCD posts/jupyter/l2hmc-4dsu3: 🔳 `l2hmc-qcd` Example: 4D SU(3) posts/jupyter/test: 🏁 `l2hmc` Example: 2D $U(1)$ talks/auroragpt/alcf-hpc-workshop-2024/auroragpt-alcf-hands-on-hpc-workshop-2024: AuroraGPT: ANL's General Purpose Scientific LLM talks/incite-hackathon-2025/auroragpt: LLMs on Aurora: Overview talks/incite-hackathon-2025/ezpz: LLMs on Aurora: Hands-On talks/openskai25/ai4science: Scientific AI at Scale: AuroraGPT talks/openskai25/training: Scientific AI at Scale: Distributed Training posts/2025/04/28: 🔥 Building PyTorch 2.6 from Source on Aurora posts/2025/05/03: 🚧 Frameworks Issue with numpy \> 2 posts/2025/06/01: 📰 Nice Headings posts/2025/06/02: 🧜‍♀️ Mermaid posts/2025/06/14: 🏗️ Building PyTorch 2.8 from Source on Aurora posts/2025/10/06: 🎨 Mixing Between Distributions While Training posts/2025/09/12: 🍹 BlendCorpus + TorchTitan @ ALCF posts/2025/09/17: 📊 `pbs-tui`: TUI for PBS Job Scheduler Monitoring posts/2025/11/12: 🧊 Cooling Down Checkpoints: Best Practices for Model Evaluation posts/2026/01/07: 🎉 Happy New Year! posts/2026/01/10: 🍋 ezpz posts/2026/02/28: ⏱️ Comparing Launchers on Aurora posts/2026/02/28: ## torchrun posts/2026/02/28: ## ezpz posts/ai-for-physics/l2hmc-qcd/2du1: 🎢 l2hmc-qcd Example: 2D U(1) posts/jupyter/l2hmc/4dsu3: 🔳 l2hmc-qcd Example: 4D SU(3) talks/2025/09/24: Training Foundation Models on Supercomputers talks/2025/10/08: AERIS: Argonne's Earth Systems Model posts/ai-for-physics/l2hmc-qcd/4dsu3nb/index-broken: 🕸️ l2hmc-qcd Example: 4D SU(3) talks/2025/10/15: Training Foundation Models on Supercomputers talks/2025/12/16: AuroraGPT: Training Foundation Models on Supercomputers talks/2025/10/24: Training Foundation Models on Supercomputers posts/drafts/2025/09/22: 📝 2025 Annual Report
 Theme Current: Light j/k or ↑/↓ + Enter

AuroraGPT

Sam Foreman 2024-09-04

🎯 AuroraGPT Goals

AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific papers, texts, data

  • Explore pathways towards a “Scientific Assistant” model
  • Build with international partners (RIKEN, BSC, others)
  • Multilingual English, 日本語, French, German, Spanish
  • Multimodal: images, tables, equations, proofs, time series, graphs, fields, sequences , etc

Image from Hannibal046/Awesome-LLM

Figure 1: Credit to the entire AuroraGPT team for slides.

  • Here to talk about AuroraGPT, Argonne’s internal effort to build a general purpose scientific LLM, broadly trained on a general corpora of text + scientific {papers, text, data}

  • As part of this effort, we plan to…

    • Explore pathways, build with international partners, multi-{lingual, modal}
  • Rough timeline of the project and deliverables:

    • 202{3,4}: text-only models, plan to release a series of {7B, 70B, 1T} models
    • 202{4,5}: Basic multi-modal models
    • 202{5,6}: Advanced scientific multimodal models

🧪 AuroraGPT: Open Science Foundation Models

  • AuroraGPT will be a publicly distributed, open source foundation model for open science
  • Is being trained on:
    • Scientific / engineering structured data
    • General text, media, news, etc.
    • Large amounts of low to medium quality data
    • Much less high quality data (that is publicly available for use)
  • This data is then cleaned, processed, de-duplicated and used for the initial pre-training phase of the model
  • The vast majority of the overall compute is spent during this initial pre-training phase
    • This is the group I help to lead and will be talking a bit about today
  • The initial pre-training phase is currently underway
    • Eventually, given a bit of time, effort and magic, the model will be ready for fine-tuning and additional training for a variety of downstream tasks
  • The pretrained model will then be handed off for additional fine-tuning on a variety of downstream tasks
    • Scientific discovery
    • Accelerate scientific tasks
    • Digital twins
    • Inverse design
    • Code optimization
    • Accelerated simulations
    • Autonomous experiments
    • Co-design
  • Becoming increasingly clear that LLMs have the potential to drastically accelerate computational science
    • We’ve seen this already for {GenSLMs, Weather / Climate / Earth Systems Modeling, Particle Physics, etc.}

📊 AuroraGPT Outcomes

  • Datasets and data pipelines for preparing science training data
  • Software infrastructure and workflows to train, evaluate and deploy LLMs at scale for scientific resarch purposes
  • Evaluation of state-of-the-art LLM Models to determine where they fall short in deep scientific tasks and where deep data may have an impact
  • Assessment of the approach of augmenting web training data with two forms of data specific to science
    • Full text scientific papers
    • Structured scientific datasets (suitably mapped to narrative form)
  • Research grade artifacts (models) for scientific community for adaptation for downstream uses
  • Promotion of responsible AI best practices where we can figure them out
  • International Collaborations around the long term goal of AGI for science
  • Deliverables:

    • datasets, pipelines
    • software infrastructure, workflows to interface with science applications
    • checkpoints, models, logs, workbook, insights, etc.
  • Hope to understand:

    • How different state-of-the-art models perform at different scientific tasks
    • where deep data may have an impact
    • feasibility of generically augmenting text with scientific structured data
  • Huge undertaking that will require large international collaborations around long term goal of AGI for science

  • Extra points:

    • Well known that LLMs are good for non-consequential tasks
    • Known to “hallucinate” and create false information
    • Can this be mitigated reliably ??

🌌 Aurora

Table 1: Aurora Specs

Racks166
Nodes10,624
CPUs21,248
GPUs63,744
NICs84,992
HBM8 PB
DDR5c10 PB

🤖 ALCF AI Testbed

  • ALCF AI Testbed Systems are in production and available for allocations to the research community
  • Significant improvement in time-to-solution and energy-efficiency for diverse AI for science applications.
  • NAIRR Pilot

Up to 25X improvement for genomic foundation models with 6.5X energy efficiency

Figure 3: SambaNova SN-30: 2nd Gen, 8 nodes with 64 AI Accelerators

Figure 4: Graphcore Bow: generation accelerators: Pod-64 configuration with 64 accelerators

Figure 5: Cerebras: 2x CS-2 WSE with Memory-X and Swarm-X technologies

Figure 6: GroqRack: 9 nodes, 8 GroqChip v1.5 Tensor streaming processors accelerators per node

👥 Team Leads

Planning

Rick StevensIan FosterRinku GuptaMike PapkaArvind RamanathanFangfang Xia

Data

Ian FosterRobert Underwood

Models / Training

Venkat VishwanathSam Foreman

Evaluation

Franck CappelloSandeep MadireddyBo Li

Post

Eliu HuertaAzton Wells

Inference

Rajeev Thakur

Comms

Charlie CatlettDavid Martin

Distribution

Brad Ullrich

🤝 Teams

  • Planning
  • Data Prep
    • Accumulate 20+ T tokens of high-quality scientific text and structured data
  • Models / Training

    1 - Train (entirely from scratch) a series of models on publicly available data
  • Evaluation
    • Skills, trustworthiness, safety, robustness, privacy, machine ethics
  • Post-Training
    • Fine-tuning, alignment
  • Inference
    • Model serving, API development / public-facing web services
  • Distribution
    • Licensing, generating and distributing artifacts for public consumption
  • Communication

🦜 Model Training

Goals

  • Want training runs at scale to be:
    • efficient
    • stable
    • reproducible
  • This requires:
    • robust data pipelines / file IO
    • effectively overlapping compute with communication
    • stability across network, filesystem, machine
  • 3D / Multi-dimensional Parallelism strategies
  • Large batch training
  • Second order optimizers
  • Sub-quadratic attention
  • State space models
  • Highly optimized GPU kernels

Challenges

  • Looong time to train, can be:
    • weeks (even months) of continuous training
    • order of magnitude longer than typical NN training jobs
  • Stability issues:
    • failures are expensive (but inevitable)
    • stragglers common at scale
  • Individual jobs are:
    • fragile
    • only as good as the worst rank
    • one hang or bad worker can crash job
    • network / filesystem / other-user(s) dependent
  • Cost / benefits of different collective communication algorithms
    • depend on optimized / efficient implementations
  • Network performance
  • Highly optimized GPU kernels

🚀 Accelerating Dataset Processing at Scale for Training

  • To train a fixed model on trillions of tokens requires:
    • Aggregating data from multiple different corpora (e.g. Reddit, StackExchange, GitHub, etc.)
    • Sampling each training batch according to a fixed distribution across corpora
    • Building indices that map batches of tokens into these files (indexing)
  • The original implementation was slow, and designed to run on a single device
    • Major bottleneck when debugging data pipeline at scale
  • Completely re-wrote an asynchronous, distributed implementation that significantly improves performance

🚀 Accelerating Dataset Processing at Scale for Training

  • Completely re-wrote an asynchronous, distributed implementation that significantly improves performance
Time spent building `BlendableDataset`Time spent building `GPTDataset`

📓 References

❤️ Thank you!

  • Organizers

  • Feel free to reach out!

🙏 Acknowledgements

This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

📑 Bibliography

Song, Shuaiwen Leon, Bonnie Kruft, Minjia Zhang, et al. 2023. DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery Through Sophisticated AI System Technologies. https://arxiv.org/abs/2310.04610.

Wei, Jason, Yi Tay, Rishi Bommasani, et al. 2022. Emergent Abilities of Large Language Models. https://arxiv.org/abs/2206.07682.

Yang, Jingfeng, Hongye Jin, Ruixiang Tang, et al. 2023. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. https://arxiv.org/abs/2304.13712.

🎁 Extras

🚂 Loooooooooong Sequence Lengths

25B33B

Figure 7: Maximum (achievable) SEQ_LEN for both 25B and 33B models (See: Song et al. (2023))

♻️ Life Cycle of the LLM

📝 Pre-training

Figure 8: Pre-training: Virtually all of the compute used during pretraining phase

🎀 Fine-Tuning

Figure 9: Fine-tuning: Fine-tuning actually updates the model’s weights to make the model better at a certain task.

🍎 Training LLMs

Figure 10: It’s hungry!

Figure 11: Visualization from Yang et al. (2023)

Footnotes

  1. Co-led by: Venkat Vishwanath, Sam Foreman