ML Engineer — Robot Foundation Model Pretraining

We are building large-scale embodied intelligence systems designed to operate across complex real-world environments. Our work focuses on training foundation models for robotics using massive multimodal datasets spanning video, language, proprioception, and action traces.
We are seeking ML Engineers to design and execute large-scale pretraining efforts for robot foundation models. This role focuses on turning raw multimodal robotic data into generalizable intelligence that transfers across embodiments, tasks, and environments.
You will work on core model training systems that define how robotic foundation models learn at scale, with direct impact on their capabilities, generalization, and real-world performance.

What You’ll Do
Design and Run Large-Scale Pretraining
Lead or contribute to large-scale training runs for robot foundation models
Work with transformer-based and diffusion-based architectures for multimodal learning
Define objectives, architectures, and training strategies for large-scale model pretraining

Build Multimodal Data and Training Strategies
Develop data mixtures and sampling strategies across large-scale multimodal datasets (video, language, action, proprioception, state)
Design training curricula for improving generalization across tasks and embodiments
Turn raw robotic interaction data into structured learning signals for foundation models

Drive Data-Driven Model Improvement
Run ablation studies to understand scaling behavior, data quality effects, and architectural tradeoffs
Analyze training dynamics, scaling laws, and model failure modes
Iterate on data and training strategies to improve general-purpose performance

Collaborate on Large-Scale Systems
Work closely with infrastructure and systems teams to improve cluster utilization, throughput, and training reliability
Contribute to optimizing multi-node, multi-GPU distributed training pipelines
Help ensure efficient execution of large-scale training workloads

Shape Data Collection and Model Direction
Guide data collection efforts toward high-impact capabilities and gaps
Identify and incorporate new datasets to expand model coverage and generalization
Bridge raw robotic data collection with downstream model capabilities

What We’re Looking For
Experience training large-scale transformer or diffusion models (e.g., language, video, audio, or generative models)
Experience with multi-node, multi-GPU distributed training systems
Strong understanding of:
Scaling laws
Optimization dynamics
Large-model training behavior and failure modes
Strong proficiency in PyTorch and ability to debug across the full ML stack
Comfort working with large-scale datasets and iterative experimentation
Strong empirical rigor combined with fast iteration ability

Preferred Experience
Experience with multimodal or generative model training at scale
Background in robotics, embodied AI, or sequential decision-making systems
Experience working with large distributed training infrastructure
Familiarity with data-driven curriculum learning or mixture design
Experience analyzing or deriving insights from large-scale training runs

Why This Role Matters
Build the core intelligence layer for general-purpose robotic systems
Directly influence how robots learn from real-world data at scale
Work on the intersection of large-scale AI training, robotics, and systems engineering
Shape foundation models that generalize across embodiments and environments

About the Company
We are a research-driven AI and robotics company focused on building scalable embodied intelligence systems. By combining advances in machine learning, large-scale training infrastructure, and robotics, we aim to develop models capable of operating in and adapting to the physical world.

We are committed to building an inclusive and diverse workplace and encourage applicants from all backgrounds to apply.
Show more Show less

Research Scientist: Pretraining

Similar Jobs