ES
Solution Architect - GPU/TPU Kernel Optimization
Accepting applicationsEPAM Systems · Coimbatore, Tamil Nadu, India
Full-Time Associate AIMachine Learningaiateddr
Posted
5d ago
Category
Test
Experience
Associate
Country
India
We are seeking an experienced Solution Architect specializing in GPU/TPU Kernel Optimization to design and optimize high-performance kernels for cutting-edge Machine Learning operations. In this role, you will redefine performance boundaries across massive training runs and high-speed inference workloads while shaping the developer infrastructure that powers next-generation AI systems.
Responsibilities
Design and optimize high-performance kernels (using languages like Pallas, Mosaic and Triton) targeting Tensor Processing Unit (TPU) and Graphics Processing Unit (GPU) architectures for critical Machine Learning (ML) operations, redefining what's possible from massive training runs to high-speed inference
Architect infrastructure such as benchmarking suites, autotuning frameworks, performance analysis tools, regression testing and documentation
Transform how the developer community interacts with increasingly critical custom kernels in key Open-Source Software (OSS) libraries
Track the latest advancements in hardware architectures, compiler technologies and AI models to identify new opportunities for performance optimization through custom kernels
Engage with ML researchers, framework developers (Just After eXecution (JAX), PyTorch) and compiler engineers (Accelerated Linear Algebra (XLA)) to enhance adoption
Identify new requirements and address bottlenecks by providing appropriate solutions
Requirements
12-18 years of experience in software development
Expertise in optimizing TPU/GPU code using low-level kernel languages like Pallas, Compute Unified Device Architecture (CUDA) or Triton
Knowledge of ML Frameworks (JAX/PyTorch), common operations like attention and Mixture of Experts (MoEs), including model optimization and low-precision formats
Understanding of modern accelerators (e.g., data movement, pipelining, heterogeneous compute and scale-out)
Understanding of compiler principles (optimization, code generation) and toolchains such as MLIR, OpenXLA
Showcase of building developer infrastructure, including Open-Source Software (OSS) libraries, flexible high-performance APIs and easy-to-consume documentation to empower the community
Excellent investigative and problem-solving capabilities with communication skills across cross-functional teams
Show more Show less
Responsibilities
Design and optimize high-performance kernels (using languages like Pallas, Mosaic and Triton) targeting Tensor Processing Unit (TPU) and Graphics Processing Unit (GPU) architectures for critical Machine Learning (ML) operations, redefining what's possible from massive training runs to high-speed inference
Architect infrastructure such as benchmarking suites, autotuning frameworks, performance analysis tools, regression testing and documentation
Transform how the developer community interacts with increasingly critical custom kernels in key Open-Source Software (OSS) libraries
Track the latest advancements in hardware architectures, compiler technologies and AI models to identify new opportunities for performance optimization through custom kernels
Engage with ML researchers, framework developers (Just After eXecution (JAX), PyTorch) and compiler engineers (Accelerated Linear Algebra (XLA)) to enhance adoption
Identify new requirements and address bottlenecks by providing appropriate solutions
Requirements
12-18 years of experience in software development
Expertise in optimizing TPU/GPU code using low-level kernel languages like Pallas, Compute Unified Device Architecture (CUDA) or Triton
Knowledge of ML Frameworks (JAX/PyTorch), common operations like attention and Mixture of Experts (MoEs), including model optimization and low-precision formats
Understanding of modern accelerators (e.g., data movement, pipelining, heterogeneous compute and scale-out)
Understanding of compiler principles (optimization, code generation) and toolchains such as MLIR, OpenXLA
Showcase of building developer infrastructure, including Open-Source Software (OSS) libraries, flexible high-performance APIs and easy-to-consume documentation to empower the community
Excellent investigative and problem-solving capabilities with communication skills across cross-functional teams
Show more Show less