M
Inference Engineer
Accepting applicationsMidjourney · San Francisco Bay Area
Full-Time Mid_senior AIaiateganrf
Posted
7 May
Category
Test
Experience
Mid_senior
Country
United States
About Midjourney
Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. We are a small self-funded team focused on design, human infrastructure, and AI. Our compute-to-headcount ratio is among the highest in the world, which means you will have more immediate influence over global AI infrastructure here than at any Big Tech firm.
About the Role
The inference layer is the critical path between a model and the image a user sees. As Inference Engineer, you will own that layer end-to-end: serving architecture, batching, queue routing, rollout, and the bridge between research and production.
In this role, you join a small team and set the architecture for inference as a platform. This includes how models move from research to production, ensuring the pipeline remains understandable as models change. Your work is the foundation of how we honour our community's passion for each model’s unique creative strengths. This architecture will support years of new and former models, and help us fuel creativity into the future.
You will orchestrate low-latency highly performant inference in cutting edge GPU clusters across the world. We don't tie ourselves to a single vendor, and our scale and market influence means vendors regularly bring us new accelerator architectures early. The systems you design will determine how fast, reliably, and cost-effectively we serve generative models at scale as we push toward subsecond image generation and real-time experiences.
The work involves complex systems engineering: optimizing inference across diverse accelerator families, building custom serving stacks for new generative architectures, and qualifying new hardware at scale. This is work at the AI frontier with problems few teams get to own end-to-end.
What You'll Do
Own the inference pipeline end-to-end.
Set the technical direction for inference as a platform, including how research models become production services.
Drive consolidation and clarity in the inference architecture. We are small and nimble; simplification and standardisation are critical to our ability to scale.
Optimise for latency, throughput, and accelerator efficiency across our global fleet.
Partner with research on the production viability of new model architectures, including quantisation, distillation, multigpu and multi-stage pipelines.
Build the instrumentation and observability that make the pipeline understandable at every layer.
Set the standard for design review, documentation, and runbooks on the inference team.
What We're Looking For
Significant systems engineering experience with an architect mindset, including shipping large-scale ML inference or distributed serving systems in production.
Deep familiarity with running and debugging PyTorch code. Some comfort debugging and diving lower into CUDA errors when necessary.
Understanding of modern GPU-based model serving paradigms (e.g. VLLM, TensorRT, or other custom stacks)
Self-directed, communicative, and action-oriented. You set technical direction through strategy and action, and above all delivered work.
Comfort holding the whole picture of a fast-moving system in your head, and improving it incrementally without losing the thread.
Strong writing and design-review instincts. You make decisions legible to humans and agents alike, with a focus on leveraging and scaling AI tools.
You communicate clearly and directly. The role lives at the seam between inference, research, and backend, and we value straightforward collaboration-focused feedback.
You take care of the people around you, not just the code. Caring about humanity starts with caring about your teammates – we hire engineers who actively help the people next to them grow.
We care more about what you have built than where you have built it. Great people at Midjourney have started in high school just as often as they have come from notable companies and universities. If you feel like you’ve got most of the skills we’re looking for, but are worried that it might not be enough, reach out anyway. Let’s chat.
Nice to Have
Production experience with diffusion model architectures or other generative pipelines.
Hardware-agnostic serving experience across multiple accelerator families.
Experience designing or evolving a model lifecycle framework: versioning, A/B, gradual rollout, automated rollback.
Familiarity with HPC schedulers (Slurm), kubernetes, and/or low-level fleet orchestration.
Why Midjourney
Speed of a startup, freedom and resources of a research lab. No investors, no quarterly reporting cycle, no committees deciding the roadmap.
Tiny team, large ambitions. Decisions show up in production within days.
We move at the speed of thought. Iterate fast, isolate variables, ship.
Hardware-agnostic by design. Inference architecture choices are real architecture choices, not procurement choices.
You will define the inference platform for years of model generations, including the substrate for real-time generative experiences. Few roles anywhere offer that scope and level of influence.
US-based, flexible location. Headquarters based in San Francisco with the team spread through remote locations and our London office.
*The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets, experience and training, certifications and licensure, location, and other business or organizational needs.
Equal Opportunity Employer
We provide and promote equal opportunity in employment, compensation, and other terms and conditions of employment without discrimination because of race, color, creed, religion, national origin, ancestry, citizenship status, sex or gender, gender identity or gender expression (including transgender status), sexual orientation, marital status, military service and veteran status, physical or mental disability, family medical history, genetic information or other protected medical condition, political affiliation, or any other characteristic protected by and in accordance with applicable laws.
Show more Show less
Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. We are a small self-funded team focused on design, human infrastructure, and AI. Our compute-to-headcount ratio is among the highest in the world, which means you will have more immediate influence over global AI infrastructure here than at any Big Tech firm.
About the Role
The inference layer is the critical path between a model and the image a user sees. As Inference Engineer, you will own that layer end-to-end: serving architecture, batching, queue routing, rollout, and the bridge between research and production.
In this role, you join a small team and set the architecture for inference as a platform. This includes how models move from research to production, ensuring the pipeline remains understandable as models change. Your work is the foundation of how we honour our community's passion for each model’s unique creative strengths. This architecture will support years of new and former models, and help us fuel creativity into the future.
You will orchestrate low-latency highly performant inference in cutting edge GPU clusters across the world. We don't tie ourselves to a single vendor, and our scale and market influence means vendors regularly bring us new accelerator architectures early. The systems you design will determine how fast, reliably, and cost-effectively we serve generative models at scale as we push toward subsecond image generation and real-time experiences.
The work involves complex systems engineering: optimizing inference across diverse accelerator families, building custom serving stacks for new generative architectures, and qualifying new hardware at scale. This is work at the AI frontier with problems few teams get to own end-to-end.
What You'll Do
Own the inference pipeline end-to-end.
Set the technical direction for inference as a platform, including how research models become production services.
Drive consolidation and clarity in the inference architecture. We are small and nimble; simplification and standardisation are critical to our ability to scale.
Optimise for latency, throughput, and accelerator efficiency across our global fleet.
Partner with research on the production viability of new model architectures, including quantisation, distillation, multigpu and multi-stage pipelines.
Build the instrumentation and observability that make the pipeline understandable at every layer.
Set the standard for design review, documentation, and runbooks on the inference team.
What We're Looking For
Significant systems engineering experience with an architect mindset, including shipping large-scale ML inference or distributed serving systems in production.
Deep familiarity with running and debugging PyTorch code. Some comfort debugging and diving lower into CUDA errors when necessary.
Understanding of modern GPU-based model serving paradigms (e.g. VLLM, TensorRT, or other custom stacks)
Self-directed, communicative, and action-oriented. You set technical direction through strategy and action, and above all delivered work.
Comfort holding the whole picture of a fast-moving system in your head, and improving it incrementally without losing the thread.
Strong writing and design-review instincts. You make decisions legible to humans and agents alike, with a focus on leveraging and scaling AI tools.
You communicate clearly and directly. The role lives at the seam between inference, research, and backend, and we value straightforward collaboration-focused feedback.
You take care of the people around you, not just the code. Caring about humanity starts with caring about your teammates – we hire engineers who actively help the people next to them grow.
We care more about what you have built than where you have built it. Great people at Midjourney have started in high school just as often as they have come from notable companies and universities. If you feel like you’ve got most of the skills we’re looking for, but are worried that it might not be enough, reach out anyway. Let’s chat.
Nice to Have
Production experience with diffusion model architectures or other generative pipelines.
Hardware-agnostic serving experience across multiple accelerator families.
Experience designing or evolving a model lifecycle framework: versioning, A/B, gradual rollout, automated rollback.
Familiarity with HPC schedulers (Slurm), kubernetes, and/or low-level fleet orchestration.
Why Midjourney
Speed of a startup, freedom and resources of a research lab. No investors, no quarterly reporting cycle, no committees deciding the roadmap.
Tiny team, large ambitions. Decisions show up in production within days.
We move at the speed of thought. Iterate fast, isolate variables, ship.
Hardware-agnostic by design. Inference architecture choices are real architecture choices, not procurement choices.
You will define the inference platform for years of model generations, including the substrate for real-time generative experiences. Few roles anywhere offer that scope and level of influence.
US-based, flexible location. Headquarters based in San Francisco with the team spread through remote locations and our London office.
*The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets, experience and training, certifications and licensure, location, and other business or organizational needs.
Equal Opportunity Employer
We provide and promote equal opportunity in employment, compensation, and other terms and conditions of employment without discrimination because of race, color, creed, religion, national origin, ancestry, citizenship status, sex or gender, gender identity or gender expression (including transgender status), sexual orientation, marital status, military service and veteran status, physical or mental disability, family medical history, genetic information or other protected medical condition, political affiliation, or any other characteristic protected by and in accordance with applicable laws.
Show more Show less