CJ
Research Scientist, LLM Evaluation & Post-Training
Accepting applicationsChatGPT Jobs · Palo Alto, CA
Full-Time Mid_senior AIPythonaiategan
Posted
23 Apr
Category
Test
Experience
Mid_senior
Country
United States
Job Description
Job Title: Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
Role Overview
This is a high-impact individual contributor and collaborative research role focused on LLM evaluation and post-training. You will lead research programs to improve AI models, develop benchmark frameworks, and partner with leading AI organizations to deliver actionable insights.
Key Responsibilities
Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains.
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and AI/ML engineers.
Engage with customer technical stakeholders to provide expert recommendations.
Contribute to internal datasets, frameworks, and intellectual property.
Publish research and contribute to thought leadership.
Core Technical Competencies
Evaluation Science & Benchmarking
LLM & Post-Training Methods
Quantitative Analysis & Scientific Rigor
Required Qualifications
MS or PhD in a quantitative field (PhD strongly preferred).
5+ years of relevant experience in applied ML research, with substantial LLM work.
Demonstrated experience with LLM evaluation, benchmarking, or post-training.
Strong foundation in experimental design and statistical analysis.
Strong Python coding skills and experience with modern ML frameworks.
Ability to evaluate human and automated evaluation methods.
Strong written and verbal communication skills.
Preferred Qualifications
Hands-on experience with post-training experiments (SFT, RLHF, etc.).
Experience with multimodal and long-context evaluation.
Experience designing agentic evaluation protocols.
Publications in top-tier AI conferences.
Experience in customer-facing applied research or consulting.
Familiarity with GenAI safety and governance.
Show more Show less
Job Title: Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
Role Overview
This is a high-impact individual contributor and collaborative research role focused on LLM evaluation and post-training. You will lead research programs to improve AI models, develop benchmark frameworks, and partner with leading AI organizations to deliver actionable insights.
Key Responsibilities
Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains.
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and AI/ML engineers.
Engage with customer technical stakeholders to provide expert recommendations.
Contribute to internal datasets, frameworks, and intellectual property.
Publish research and contribute to thought leadership.
Core Technical Competencies
Evaluation Science & Benchmarking
LLM & Post-Training Methods
Quantitative Analysis & Scientific Rigor
Required Qualifications
MS or PhD in a quantitative field (PhD strongly preferred).
5+ years of relevant experience in applied ML research, with substantial LLM work.
Demonstrated experience with LLM evaluation, benchmarking, or post-training.
Strong foundation in experimental design and statistical analysis.
Strong Python coding skills and experience with modern ML frameworks.
Ability to evaluate human and automated evaluation methods.
Strong written and verbal communication skills.
Preferred Qualifications
Hands-on experience with post-training experiments (SFT, RLHF, etc.).
Experience with multimodal and long-context evaluation.
Experience designing agentic evaluation protocols.
Publications in top-tier AI conferences.
Experience in customer-facing applied research or consulting.
Familiarity with GenAI safety and governance.
Show more Show less
Similar Jobs
M
HBM PE DFT
Micron · Boise, United States, North America
N
Test Engineer - Photonic
NVIDIA · Roskilde, Denmark, Europe
N
Lead Engineer, Healthcare Data Operations and Strategy
NVIDIA · Santa Clara, United States, North America
AM
Administrative Assistant – Categorie Protette L.68/99
Applied Materials · Treviso, Italy, Europe