AT

Staff Compiler Engineer

Accepting applications

Acceler8 Talent · Palo Alto, CA

Full-Time Mid_senior AIASICC++FPGAPython
Posted
4d ago
Category
Test
Experience
Mid_senior
Country
United States
Staff / Principal Compiler Engineer

Acceler8 Talent is partnering with an early-stage startup to hire a Compiler Engineer to join as an MTS.

They're building advanced AI-driven tools and infrastructure for custom ASIC development at scale, focusing on enabling next-generation domain-specific hardware architectures for modern machine learning workloads. Their team combines expertise across AI systems, compilers, and silicon design to develop tightly integrated hardware/software solutions for emerging compute platforms.


They're seeking a staff/principal-level compiler engineer with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates will have experience delivering production compiler stacks targeting specialized ML hardware.



What You’ll Do
As a member of the compiler team, you will own key portions of the compiler stack targeting a SIMD/VLIW-style NPU architecture, spanning graph ingestion through code generation on production silicon. You will work closely with architecture and silicon teams to co-design hardware and compiler capabilities.



Responsibilities
Own compiler functionality end-to-end, including graph ingestion (e.g., ONNX, PyTorch), IR optimization, AI-focused code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU
Implement and maintain memory management functionality, including compiler-managed scratchpad memory, data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks
Design and optimize mid-end and backend compiler passes including operator fusion, loop transformations, vectorization, and software pipelining to maximize hardware utilization
Collaborate with architecture and silicon teams on ISA and instruction encoding design, using workload performance data to influence hardware decisions
Support quantization and mixed-precision lowering across FP32, integer formats, INT8/4, BF16, FP16/8/4, and related precision modes with correct numerical behavior
Benchmark compiler output using cycle-accurate models, RTL simulation environments, and FPGA prototypes while owning quality-of-results tracking
Contribute to the growth and technical direction of the compiler team over time



What We’re Looking For
Qualifications & Skills
Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, or a related field
5+ years of experience building compilers or code generation toolchains for custom accelerators; experience focused solely on general-purpose CPU compiler infrastructure is not sufficient
Hands-on experience targeting ML/AI accelerator architectures and compiler stacks for custom hardware platforms
Strong understanding of instruction scheduling, register allocation, and software pipelining, particularly for SIMD/VLIW or spatial architectures
Experience with ML-focused optimizations including tiling strategies, loop nest optimization, operator fusion, and optimization of workloads such as convolution, attention, reductions, element-wise operations, and transpositions
Experience with software-managed memory systems, including scratchpad allocation, data layout optimization, DMA orchestration, and multi-buffering techniques
Strong C++ development skills and Python proficiency
Familiarity with LLVM and/or MLIR infrastructure
Ability to lead technical initiatives and help scale compiler engineering efforts



Preferred / Bonus Experience
Hardware/software co-design experience, including ISA definition, instruction encoding design, or compiler-driven hardware feature development
Experience designing IRs for ML accelerators, including custom dialects, MLIR-based flows, or graph-level IR systems
Familiarity with ML frameworks such as PyTorch or TensorFlow and graph interchange formats such as ONNX
Experience benchmarking and profiling compiler output on hardware platforms, FPGA prototypes, or cycle-accurate simulators
Understanding of ML inference optimization techniques including FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and decode scheduling
Contributions to open-source ML compiler ecosystems such as TVM, MLIR, Triton, or XLA
Experience with energy-efficient, high-performance accelerator bring-up and optimization
Show more Show less