AT

RTL Design Engineer - Compute Engines

Accepting applications

Acceler8 Talent · Santa Clara, CA

Full-Time Mid_senior AIRTLSoCaiate
Posted
21 Apr
Category
Design
Experience
Mid_senior
Country
United States
Acceler8 Talent is partnered with a venture-backed, stealth-stage company focused on building advanced AI inference platforms at the rack level to hire an experienced RTL Design Engineer with expertise on compute engines.

Their technology is centered around a differentiated system-on-chip design that enables system-level optimizations for highly efficient inference in data center environments. The team is developing both custom hardware and extending open-source software to support next-generation AI workloads.


This role focuses on microarchitecture design and RTL development for compute engines within next-generation SoCs. The position offers end-to-end ownership, from early architectural definition through to silicon realization.


Key Responsibilities
Architect and develop execution pipelines, including scheduling strategies and control mechanisms
Define compute datapaths, register files, and localized memory structures such as scratchpads and hierarchical storage
Implement support for emerging numerical formats (e.g., FP16, BF16, FP8, FP4)
Optimize designs across performance, efficiency, power consumption, and silicon area
Ensure effective balance between compute throughput, memory bandwidth, and broader SoC constraints
Collaborate closely with architecture, compiler, kernel, and performance engineering teams
Analyze representative AI workloads to inform design and optimization decisions
Explore and assess trade-offs in instruction set architecture and programming models
Leverage modeling and profiling tools to guide architectural and microarchitectural choices
Drive initial RTL development efforts and contribute to ongoing implementation
Support verification processes and timing closure activities
Participate in emulation, debugging, silicon bring-up, and post-silicon performance tuning


Qualifications
Solid experience in AI compute or accelerator design, demonstrated through ownership of complex hardware components across multiple technology generations
Strong knowledge of vector and matrix processing architectures
Deep understanding of floating-point computation and quantization techniques
Experience optimizing for performance per watt
Track record of contributing to successful silicon delivery
Prior experience in startup or early-stage hardware programs is beneficial
Advanced degree (MS or PhD) with approximately 3–5 years of relevant experience, or equivalent practical background
Show more Show less