AL
Member of Technical Staff - Microarchitect / RTL Design
Accepting applicationsArchitect Labs · Palo Alto, CA
Full-Time Mid_senior AIASICDDRFPGAPCIe
Posted
1d ago
Category
Design
Experience
Mid_senior
Country
United States
About Architect
Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.
What You'll Do
As a Founding Member of the Technical Staff on the RTL Design team at Architect, you'll own the AI-driven microarchitecture and RTL design of mission-critical SoC blocks and sub-systems (SS) going into production silicon. You will be expected to define, drive, and revise the block-level micro-arch specification for one of the fundamental HW accelerator blocks.
As such, you fit our current opportunities if you have hands-on design and block-owner experience in any of the following ASIC components: ML/AI accelerators, SIMD vector engines, DSPs, GPUs, on-chip memory SS, on-chip interconnect SS (NoCs, DMAs, etc.) with custom or standard protocols (AXI, etc.), IO and peripherals integration (PCIe, DDR, CXL, etc.), CPU/Host/controllers, Security, Compression.
As a lab, we are investing in building a world-class HW design team, so if you think you have a particular experience/background that is not listed here, please still reach out to us!
Own AI-driven RTL design flow end-to-end (at the frontend): through code generation to incorporating feedback from lint, CDC, synthesis, and timing closure stages for closing the design loop.
Work directly with the principal architect to refine microarchitectural specs, resolve implementation trade-offs, and feed area/timing/power realities back into the architecture and internal AI systems.
Define and maintain interface specifications (e.g. AXI, AXI-Stream, or custom-built) for block- and SS-level integration.
Build and maintain RTL infrastructure for our in-house AI-driven flow: design automation scripts, regression flows, lint/CDC waivers, and integration collateral.
Close collaboration with DV: Support DV bring-up with reference models, assertions, test-plans, and architectural documentation for verification closure.
Close collaboration with SW and ML: Support and guide our SW and ML experts to revise and improve our in-house AI flow based on your own experience.
Support FPGA prototyping on Xilinx for early functional validation.
What We'd Like to See
Qualifications & Skills:
Degree: Bachelor's, Master's, or PhD in Electrical Engineering, Computer Engineering, or a closely related field.
Experience: 5+ years (10+ preferred) in RTL design with at least one advanced-node tapeout experience.
Domain Background: RTL design experience on specialized HW accelerators, such as SoCs/IPs integrating XPUs (NPU, GPU, AR/VR) or AI/ML accelerators. Ideally having worked on Apple Neural Engine, Qualcomm Hexagon NPU / AI Engine, Google Edge TPU, AMD XDNA, Samsung NPU, MediaTek APU, NVIDIA DLA blocks, or accelerators at Groq, Cerebras, MatX, d-Matrix, or similar/equivalent.
SystemVerilog: Clear, synthesizable, lint-clean RTL with strong design habits such as parameterization, modularity, reuse and configurability.
Block-Level Depth: Hands-on experience with block-specific compute datapaths and data movement; such as MAC arrays, vector units, accumulators, on-chip SRAM controllers and arbiters, DMA engines, scratchpad memory management, etc.
SoC Methodology: Solid grasp of synthesis, timing constraints, clock domain crossings, reset strategies, AMBA protocols (AXI, AHB, APB), power management techniques, etc.
Python: Strong skills for design automation, regression infrastructure, and tooling.
PPA Ownership: Experience taking a block from RTL through synthesis and working with PD teams on timing/area/power closure.
Leadership: Ability to lead RTL design efforts and grow into a team lead over time.
Bonus:
Low-power design techniques: clock gating, power gating, multi-voltage domains, UPF.
FPGA prototyping experience (ideally Xilinx Vivado/Vitis).
Familiarity with SIMD/VLIW execution pipelines or instruction-driven datapath design.
Experience writing SVA assertions and functional coverage for design-side verification.
Prior IP building and delivery experience on your block-of-expertise, such as DMA controllers, memory subsystems, interconnects, or similar SoC infrastructure blocks.
Domain-specific expertise: Track record on research and development on energy-efficient, high-performance HW accelerators on your block-of-expertise.
What We Offer
Competitive salary and meaningful equity stake
Fast-paced startup with autonomy and visible impact
Cutting-edge challenges at the intersection of AI and silicon design
Show more Show less
Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.
What You'll Do
As a Founding Member of the Technical Staff on the RTL Design team at Architect, you'll own the AI-driven microarchitecture and RTL design of mission-critical SoC blocks and sub-systems (SS) going into production silicon. You will be expected to define, drive, and revise the block-level micro-arch specification for one of the fundamental HW accelerator blocks.
As such, you fit our current opportunities if you have hands-on design and block-owner experience in any of the following ASIC components: ML/AI accelerators, SIMD vector engines, DSPs, GPUs, on-chip memory SS, on-chip interconnect SS (NoCs, DMAs, etc.) with custom or standard protocols (AXI, etc.), IO and peripherals integration (PCIe, DDR, CXL, etc.), CPU/Host/controllers, Security, Compression.
As a lab, we are investing in building a world-class HW design team, so if you think you have a particular experience/background that is not listed here, please still reach out to us!
Own AI-driven RTL design flow end-to-end (at the frontend): through code generation to incorporating feedback from lint, CDC, synthesis, and timing closure stages for closing the design loop.
Work directly with the principal architect to refine microarchitectural specs, resolve implementation trade-offs, and feed area/timing/power realities back into the architecture and internal AI systems.
Define and maintain interface specifications (e.g. AXI, AXI-Stream, or custom-built) for block- and SS-level integration.
Build and maintain RTL infrastructure for our in-house AI-driven flow: design automation scripts, regression flows, lint/CDC waivers, and integration collateral.
Close collaboration with DV: Support DV bring-up with reference models, assertions, test-plans, and architectural documentation for verification closure.
Close collaboration with SW and ML: Support and guide our SW and ML experts to revise and improve our in-house AI flow based on your own experience.
Support FPGA prototyping on Xilinx for early functional validation.
What We'd Like to See
Qualifications & Skills:
Degree: Bachelor's, Master's, or PhD in Electrical Engineering, Computer Engineering, or a closely related field.
Experience: 5+ years (10+ preferred) in RTL design with at least one advanced-node tapeout experience.
Domain Background: RTL design experience on specialized HW accelerators, such as SoCs/IPs integrating XPUs (NPU, GPU, AR/VR) or AI/ML accelerators. Ideally having worked on Apple Neural Engine, Qualcomm Hexagon NPU / AI Engine, Google Edge TPU, AMD XDNA, Samsung NPU, MediaTek APU, NVIDIA DLA blocks, or accelerators at Groq, Cerebras, MatX, d-Matrix, or similar/equivalent.
SystemVerilog: Clear, synthesizable, lint-clean RTL with strong design habits such as parameterization, modularity, reuse and configurability.
Block-Level Depth: Hands-on experience with block-specific compute datapaths and data movement; such as MAC arrays, vector units, accumulators, on-chip SRAM controllers and arbiters, DMA engines, scratchpad memory management, etc.
SoC Methodology: Solid grasp of synthesis, timing constraints, clock domain crossings, reset strategies, AMBA protocols (AXI, AHB, APB), power management techniques, etc.
Python: Strong skills for design automation, regression infrastructure, and tooling.
PPA Ownership: Experience taking a block from RTL through synthesis and working with PD teams on timing/area/power closure.
Leadership: Ability to lead RTL design efforts and grow into a team lead over time.
Bonus:
Low-power design techniques: clock gating, power gating, multi-voltage domains, UPF.
FPGA prototyping experience (ideally Xilinx Vivado/Vitis).
Familiarity with SIMD/VLIW execution pipelines or instruction-driven datapath design.
Experience writing SVA assertions and functional coverage for design-side verification.
Prior IP building and delivery experience on your block-of-expertise, such as DMA controllers, memory subsystems, interconnects, or similar SoC infrastructure blocks.
Domain-specific expertise: Track record on research and development on energy-efficient, high-performance HW accelerators on your block-of-expertise.
What We Offer
Competitive salary and meaningful equity stake
Fast-paced startup with autonomy and visible impact
Cutting-edge challenges at the intersection of AI and silicon design
Show more Show less