The mission

Every claim we make about latency, throughput, and efficiency has to be earned — measured, modeled, proven. We do not ship projections. This role is the instrument that generates the numbers that change what we build before it is too late to change it.

What you own

Low-level cycle modeling and performance analysis of our accelerator's execution engine. Branch behavior, cache pressure, memory access choreography, queue depth — translated into architectural feedback that drives tape-out decisions.

What you'll do

Build cycle-accurate or statistical performance models of agent workloads running on our hardware
Profile on real silicon using hardware performance counters and pre-silicon emulation environments
Identify bottlenecks across the data path and produce latency budgets that drive upstream architecture decisions
Own the performance regression suite as the team and hardware scale
Produce analysis rigorous enough to change a tape-out decision

What we're looking for

Deep fluency in x86 or ARM microarchitecture — knows what a pipeline stall costs in wall-clock terms, not just in theory
Has written or used architectural simulators (gem5, Sniper, ZSim, or equivalent) to model real systems
Reads assembly to diagnose problems, not just to write code
Background in CPU performance engineering at Intel, AMD, Qualcomm, Apple, Ampere, or equivalent
Bonus: experience with PCIe device driver performance or accelerator bring-up

Signal keywords

PMU countersgem5 / Sniper / ZSimMicroarchitectureCache modelingPipeline analysisC / C++
Show more Show less

CPU & Microarchitecture Performance Engineer

Similar Jobs