We're hiring on behalf of a Haystack partner!

The Role

Lead accuracy-centric architecture and optimization for LLMs, VLMs, and multimodal AI models.
Drive Day0 hardware enablement on current and future AI platforms.
Design and implement quantization strategies to balance model quality, performance, and hardware.
Analyze and resolve accuracy regressions and numerical stability issues across the inference stack.
Collaborate with performance engineers to co-optimize kernels and execution strategies.
Define and implement accuracy evaluation metrics and tooling for robust model deployment.

What You'll Need

Extensive experience with LLMs and/or VLMs in production or pre-production environments.
Expert-level understanding of quantization, numerics, and precision tradeoffs.
Deep knowledge of transformer architectures, attention mechanisms, and MoEs.
Proven ability to balance accuracy, performance, and hardware constraints.
Strong Python skills and experience across compiler, kernel, and hardware abstraction layers.
A Bachelor's degree with 4+ years of experience, or a Master's with 3+ years, or a PhD with 2+ years in a related field.

What's On Offer

Opportunity to work at the forefront of Cloud AI and foundation model inference.
A senior technical role with broad cross-functional impact.
Competitive annual discretionary bonus and opportunity for annual RSU grants.
Comprehensive benefits package designed to support work-life balance.

Apply via Haystack today!

Show more Show less

AI Engineer

Similar Jobs