E
AgenticOps Engineer
Accepting applicationsExpandIQ · United States
Full-Time Associate AISOCaiategan
Posted
30 Apr
Category
Test
Experience
Associate
Country
United States
AgenticOps Engineer
A leading PE-backed SaaS platform serving a highly regulated industry at scale. The company is not just modernizing its software — it is rebuilding how software itself gets built, using a managed fleet of AI agents operating across the full software delivery lifecycle. This is one of the most forward-leaning AI engineering investments happening inside a mature SaaS company right now.
About the Role
This is not a role for someone who read about AI agents last year. This is a role for someone who is actively in it every day.
The company has stood up a dedicated AgenticOps practice — a cross-cutting engineering function that operates and continuously improves a managed fleet of AI agents across every SDLC phase: requirements, design, implementation, testing, documentation, deployment, and maintenance.
The AgenticOps Engineer owns the operational layer that makes this fleet run reliably, securely, and at scale. You will not be embedded in a single product team. You will have a cross-cutting view of how agents perform across the entire engineering organization.
What You Will Own
Agent Platform & Infrastructure — Build and maintain the end-to-end platform orchestrating the agent fleet: task intake, routing, sandboxed execution, automated validation gates, output submission, and feedback loops
Agent Configuration & Optimization — Select, configure, and tune agents — Claude Code, Codex, Devin, custom stacks — for the task types, languages, and codebases they are assigned to; evaluate new agents and model versions as the market evolves
Quality Systems & Output Validation — Build and maintain automated gates before any human review: test passage, coverage thresholds, style compliance, security scanning, and build integrity; own evaluation harnesses and regression suites for agent workflows
Pipeline Operations & Task Design — Partner with engineering leads to define what agent-ready means for each SDLC phase; shape the intake process and drive the organization toward self-service as the practice matures
Observability & Monitoring — Own dashboards, logging, alerting, and analytics providing visibility into agent behavior, performance, cost, and outcomes across the fleet; surface degradation before teams feel it
Cost Management — Monitor and optimize LLM spend and compute; track cost per unit of work produced — dollars per merged PR, per generated test suite, per validated deployment — and drive it down
Security, Governance & Compliance — Enforce agent access controls, data handling policies, and audit trail requirements; ensure every agent-produced artifact is traceable end-to-end
Escalation Support & Knowledge Sharing — Serve as the on-call specialist when engineers hit persistent walls with agent output; diagnose root cause, pair on fixes, and roll learnings back into shared configuration and documentation
What We Are Looking For
Required
4+ years of software engineering experience with strong fundamentals in systems thinking and debugging
Hands-on, current experience building with LLM APIs — prompt design, tool use, function calling, context management
Demonstrated ability to diagnose and resolve complex cross-cutting technical issues across multiple teams and systems
Strong analytical skills — comfortable building dashboards, writing queries, and reasoning about statistical patterns in non-deterministic system output
Working knowledge of secure software development practices — access control, audit logging, sensitive data handling in automated pipelines
Excellent written and verbal communication — this role lives on documentation, cross-team clarity, and knowledge transfer
Preferred
Experience with prompt evaluation frameworks and LLM observability tooling — LangSmith, Braintrust, Humanloop, or equivalent
Background in developer tooling, platform engineering, or SRE/DevOps with reliability principles applied to non-deterministic systems
Familiarity with multiple LLM providers and coding agents — Claude Code, Codex, Devin
Hands-on experience with Kubernetes, Helm, AWS EKS, Terraform, and GitLab CI
Familiarity with MCP — Model Context Protocol — including servers, clients, tools, and resource exposure
Exposure to SOC 2, ISO 27001, or similar compliance frameworks and producing audit evidence for automated systems
Experience working cross-functionally across multiple product teams without direct authority
The Tech Stack
AI coding assistant: Claude Code (primary), Copilot, Cursor
Cloud development environments: Coder
Infrastructure: AWS with EKS and Terraform
CI/CD: GitLab
Identity/SSO: Azure AD and Okta
Observability: CloudWatch and Grafana
What This Role Is Not
Not a team lead or management role — individual contributor with cross-cutting influence
Not a data science or ML engineering role — you tune how the organization uses models, you do not train or fine-tune them
Not a project manager — you own the health of the agent layer, not any team's delivery plan
Not an architect for product systems — architectural decisions belong to senior engineers on product teams
Why This Role Is Rare
Most companies are still debating whether to use AI agents in their engineering workflows. This company has already committed — standing up a dedicated practice, a managed fleet, and a cross-cutting engineering function to operate it at scale. The AgenticOps Engineer joining now will help build the infrastructure the entire engineering organization runs on, at a company with the scale, resources, and PE backing to do it right.
If you are actively building with AI agents every day and want to turn that into a serious operational discipline at a company mov
Show more Show less
A leading PE-backed SaaS platform serving a highly regulated industry at scale. The company is not just modernizing its software — it is rebuilding how software itself gets built, using a managed fleet of AI agents operating across the full software delivery lifecycle. This is one of the most forward-leaning AI engineering investments happening inside a mature SaaS company right now.
About the Role
This is not a role for someone who read about AI agents last year. This is a role for someone who is actively in it every day.
The company has stood up a dedicated AgenticOps practice — a cross-cutting engineering function that operates and continuously improves a managed fleet of AI agents across every SDLC phase: requirements, design, implementation, testing, documentation, deployment, and maintenance.
The AgenticOps Engineer owns the operational layer that makes this fleet run reliably, securely, and at scale. You will not be embedded in a single product team. You will have a cross-cutting view of how agents perform across the entire engineering organization.
What You Will Own
Agent Platform & Infrastructure — Build and maintain the end-to-end platform orchestrating the agent fleet: task intake, routing, sandboxed execution, automated validation gates, output submission, and feedback loops
Agent Configuration & Optimization — Select, configure, and tune agents — Claude Code, Codex, Devin, custom stacks — for the task types, languages, and codebases they are assigned to; evaluate new agents and model versions as the market evolves
Quality Systems & Output Validation — Build and maintain automated gates before any human review: test passage, coverage thresholds, style compliance, security scanning, and build integrity; own evaluation harnesses and regression suites for agent workflows
Pipeline Operations & Task Design — Partner with engineering leads to define what agent-ready means for each SDLC phase; shape the intake process and drive the organization toward self-service as the practice matures
Observability & Monitoring — Own dashboards, logging, alerting, and analytics providing visibility into agent behavior, performance, cost, and outcomes across the fleet; surface degradation before teams feel it
Cost Management — Monitor and optimize LLM spend and compute; track cost per unit of work produced — dollars per merged PR, per generated test suite, per validated deployment — and drive it down
Security, Governance & Compliance — Enforce agent access controls, data handling policies, and audit trail requirements; ensure every agent-produced artifact is traceable end-to-end
Escalation Support & Knowledge Sharing — Serve as the on-call specialist when engineers hit persistent walls with agent output; diagnose root cause, pair on fixes, and roll learnings back into shared configuration and documentation
What We Are Looking For
Required
4+ years of software engineering experience with strong fundamentals in systems thinking and debugging
Hands-on, current experience building with LLM APIs — prompt design, tool use, function calling, context management
Demonstrated ability to diagnose and resolve complex cross-cutting technical issues across multiple teams and systems
Strong analytical skills — comfortable building dashboards, writing queries, and reasoning about statistical patterns in non-deterministic system output
Working knowledge of secure software development practices — access control, audit logging, sensitive data handling in automated pipelines
Excellent written and verbal communication — this role lives on documentation, cross-team clarity, and knowledge transfer
Preferred
Experience with prompt evaluation frameworks and LLM observability tooling — LangSmith, Braintrust, Humanloop, or equivalent
Background in developer tooling, platform engineering, or SRE/DevOps with reliability principles applied to non-deterministic systems
Familiarity with multiple LLM providers and coding agents — Claude Code, Codex, Devin
Hands-on experience with Kubernetes, Helm, AWS EKS, Terraform, and GitLab CI
Familiarity with MCP — Model Context Protocol — including servers, clients, tools, and resource exposure
Exposure to SOC 2, ISO 27001, or similar compliance frameworks and producing audit evidence for automated systems
Experience working cross-functionally across multiple product teams without direct authority
The Tech Stack
AI coding assistant: Claude Code (primary), Copilot, Cursor
Cloud development environments: Coder
Infrastructure: AWS with EKS and Terraform
CI/CD: GitLab
Identity/SSO: Azure AD and Okta
Observability: CloudWatch and Grafana
What This Role Is Not
Not a team lead or management role — individual contributor with cross-cutting influence
Not a data science or ML engineering role — you tune how the organization uses models, you do not train or fine-tune them
Not a project manager — you own the health of the agent layer, not any team's delivery plan
Not an architect for product systems — architectural decisions belong to senior engineers on product teams
Why This Role Is Rare
Most companies are still debating whether to use AI agents in their engineering workflows. This company has already committed — standing up a dedicated practice, a managed fleet, and a cross-cutting engineering function to operate it at scale. The AgenticOps Engineer joining now will help build the infrastructure the entire engineering organization runs on, at a company with the scale, resources, and PE backing to do it right.
If you are actively building with AI agents every day and want to turn that into a serious operational discipline at a company mov
Show more Show less
Similar Jobs
M
HBM PE DFT
Micron · Boise, United States, North America
N
Test Engineer - Photonic
NVIDIA · Roskilde, Denmark, Europe
N
Lead Engineer, Healthcare Data Operations and Strategy
NVIDIA · Santa Clara, United States, North America
AM
Administrative Assistant – Categorie Protette L.68/99
Applied Materials · Treviso, Italy, Europe