The role
You work directly with medical imaging companies preparing FDA 510(k) or De Novo submissions, and with hospitals evaluating those models for adoption. You own customer engagements end-to-end and help shape how we run them.
Understand the customer's domain, model, and clinical goal — deeply. This means reading between the lines on what they say they want versus what they actually need, and figuring out how to persuade them when those diverge.
Identify the highest-priority question — and design the evaluation that answers it. A customer might ask: "What are my model's weaknesses?" The real work is decomposing that question. Who is the stakeholder we are building evidence for? What uncertainties actually matter? Do we start with the data, the representations, or the failure surface? How do we generate rigorous evidence in one or two weeks — evidence the customer can use for development decisions, regulatory go/no-go decisions or submissions, or hospital conversations?
Run evaluations and accumulate findings. Sometimes the result is a concrete failure mode. Sometimes it is evidence that a concern does not materialize. Over time, these evaluations build a richer picture of how models behave across deployments, populations, and clinical settings.
Generalize what we learn into reusable infrastructure. Over time, patterns across evaluations become infrastructure: evaluation pipelines, evidence schemas, failure taxonomies, and stakeholder-facing reporting systems. But in consequential domains, evidence is never fully automatic. The important work is understanding what evidence matters for a specific stakeholder and decision — and coming in as an authority who provides judgment, not just data.
The people doing this work become unusually knowledgeable about how medical AI systems behave in the real world and over time become trusted representatives of the company to model developers, regulators, and hospital stakeholders.

We are hiring an ML engineer, but the role is really for a founding member of the team. As part of that, we are looking for someone who can challenge our assumptions and work with us beyond the technical to build what we think is foundational infrastructure for the future of ML, starting with delivering real value to our customers.
Qualifications
Required:
Strong empirical ML skills — comfortable designing and scoping investigations, analyzing failure cases, and reasoning about distribution shift, uncertainty, and signal-vs-artifact in real ML systems
High tolerance for ambiguity — able to work with messy real-world data, imperfect ground truth, and questions that don't have clean answers
Fluent in Python for the full investigation loop — from raw data through analysis to defensible conclusions
Clear technical communication — able to explain findings, including their uncertainty, to customers and non-technical stakeholders who need to act on them
Preferred:
Substantive experience in ML — 3–5 years of professional experience, or a PhD / strong research track record (e.g., model evaluation, robustness, OOD detection, or equivalent ML depth)
Applied evaluation experience — evaluating, validating, or debugging real-world ML systems
Familiarity with adjacent fields — interpretability, post-deployment monitoring, or safety-critical evaluation
Domain exposure — medical imaging, healthcare ML, or other safety-critical domains
Customer-facing experience — working directly with customers or cross-functional stakeholders in a technical role
Compensation: $160k - $220k, 0.5-2.0% Equity, Insurance
Show more Show less

Founding Forward Deployed Machine Learning Engineer [33144]

Similar Jobs