CJ

Research Scientist, LLM Evaluation & Post-Training

Accepting applications

ChatGPT Jobs · Palo Alto, CA

Full-Time Mid_senior AIPythonaiategan
Posted
23 Apr
Category
Test
Experience
Mid_senior
Country
United States
Job Description

Job Title: Research Scientist, LLM Evaluation & Post-Training

Company: Centific

Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)

Type: Full-time

Salary: $150K - $160K Annually

Role Overview

This is a high-impact individual contributor and collaborative research role focused on LLM evaluation and post-training. You will lead research programs to improve AI models, develop benchmark frameworks, and partner with leading AI organizations to deliver actionable insights.

Key Responsibilities

Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains.
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and AI/ML engineers.
Engage with customer technical stakeholders to provide expert recommendations.
Contribute to internal datasets, frameworks, and intellectual property.
Publish research and contribute to thought leadership.

Core Technical Competencies

Evaluation Science & Benchmarking
LLM & Post-Training Methods
Quantitative Analysis & Scientific Rigor

Required Qualifications

MS or PhD in a quantitative field (PhD strongly preferred).
5+ years of relevant experience in applied ML research, with substantial LLM work.
Demonstrated experience with LLM evaluation, benchmarking, or post-training.
Strong foundation in experimental design and statistical analysis.
Strong Python coding skills and experience with modern ML frameworks.
Ability to evaluate human and automated evaluation methods.
Strong written and verbal communication skills.

Preferred Qualifications

Hands-on experience with post-training experiments (SFT, RLHF, etc.).
Experience with multimodal and long-context evaluation.
Experience designing agentic evaluation protocols.
Publications in top-tier AI conferences.
Experience in customer-facing applied research or consulting.
Familiarity with GenAI safety and governance.
Show more Show less