Summary

Ardenus builds applied intelligence systems for environments where language models are doing structured, consequential work — extraction, reasoning, agent execution, structured generation — rather than generating prose for a chat surface. The bar is production reliability, cost discipline, and measurable quality gains over time.

As an AI Applications Engineer, you own the systems that make frontier models usable in production: caching, tool execution, agent orchestration, evaluation harnesses, and cost-per-task instrumentation. You are not training models. You are the engineer who decides how they are run, how they are measured, and how they are improved.

Description

Architect and operate the model-orchestration layer, including caching strategy, tool-call routing, batched inference, and model selection per call site.

Design and ship agent loops that read structured inputs, execute tools deterministically, and produce auditable outputs.

Own the evaluation infrastructure end-to-end — eval design, dataset curation, regression detection, and the dashboards that drive prompt and model decisions.

Build the protocol surfaces that expose platform capabilities to internal and external agent systems.

Drive token-cost and latency budgets per product surface. Make tradeoffs visible.

Set the engineering standard for how applied AI is built and shipped across the company.

Minimum Qualifications

You should:
Have 6+ years shipping production software, with at least 2 of those years building LLM-backed systems that real customers depend on (not internal demos, not prototypes).
Have first-principles understanding of transformer architecture, attention, decoding strategies, and the engineering tradeoffs of context length, caching, and quantization.
Have shipped at least one production agent system and be able to articulate why it stayed reliable when most don't.
Have built or substantially extended an evaluation harness that materially changed a product decision.
Be deeply fluent in Python and TypeScript, with the systems instinct to know when to drop to a lower level.

Preferred Qualifications

You can:
Point to public technical work — a published paper, an open-source contribution to a major LLM framework or inference engine, a conference talk, or a widely-read technical post.
Demonstrate experience with tool-use protocols, structured output schemas, or constrained decoding at production scale.
Show retrieval architecture you designed — embedding pipelines, hybrid search, re-ranking, or context compression — with measured impact on a real product.
Reason quantitatively about token economics, cache hit rates, and per-task cost ceilings.
Speak to inference-time optimization (KV-cache reuse, speculative decoding, distillation) from production experience, not reading.
Show more Show less

AI / LLM Applications Engineer

Similar Jobs