CJ
AI/ML Research Scientist, LLM Post-Training & Evaluation
Accepting applicationsChatGPT Jobs · Palo Alto, CA
Full-Time Mid_senior AIMachine LearningPythonaiate
Posted
17 Apr
Category
Test
Experience
Mid_senior
Country
United States
Job Description
Job Description
Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
About The Role
Centific is seeking a Research Scientist focused on LLM evaluation and post-training. This role involves defining and executing research agendas, developing evaluation frameworks, analyzing model behavior, and collaborating with cross-functional teams and customer stakeholders. The goal is to improve LLM evaluation methodologies and drive advancements in AI deployment.
Key Responsibilities
Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains (long-context, cross-modal, dynamic multi-turn).
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and ML engineers on evaluation and training pipelines.
Engage with customer technical stakeholders to understand evaluation goals and provide recommendations.
Contribute to knowledge creation through datasets, frameworks, reports, and publications.
Promote thought leadership in LLM evaluation and post-training.
Required Qualifications
MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or related quantitative field (PhD preferred).
5+ years of relevant experience in applied ML research, with substantial work in LLMs or foundation models.
Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
Strong Python coding skills for research, data processing, and ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow).
Ability to evaluate and compare human and automated evaluation methods.
Strong written and verbal communication skills.
Preferred Qualifications
Hands-on experience with fine-tuning or post-training experiments (SFT, preference optimization, RLHF/RLAIF).
Experience with multimodal and long-context evaluation.
Experience designing multi-turn, interactive, or agentic evaluation protocols.
Publications or open-source contributions in LLM evaluation at top venues.
Experience in customer-facing applied research or technical consulting.
Familiarity with safety, trustworthiness, and governance in GenAI evaluation.
Show more Show less
Job Description
Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Salary: $150K - $160K Annually
About The Role
Centific is seeking a Research Scientist focused on LLM evaluation and post-training. This role involves defining and executing research agendas, developing evaluation frameworks, analyzing model behavior, and collaborating with cross-functional teams and customer stakeholders. The goal is to improve LLM evaluation methodologies and drive advancements in AI deployment.
Key Responsibilities
Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains (long-context, cross-modal, dynamic multi-turn).
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and ML engineers on evaluation and training pipelines.
Engage with customer technical stakeholders to understand evaluation goals and provide recommendations.
Contribute to knowledge creation through datasets, frameworks, reports, and publications.
Promote thought leadership in LLM evaluation and post-training.
Required Qualifications
MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or related quantitative field (PhD preferred).
5+ years of relevant experience in applied ML research, with substantial work in LLMs or foundation models.
Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
Strong Python coding skills for research, data processing, and ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow).
Ability to evaluate and compare human and automated evaluation methods.
Strong written and verbal communication skills.
Preferred Qualifications
Hands-on experience with fine-tuning or post-training experiments (SFT, preference optimization, RLHF/RLAIF).
Experience with multimodal and long-context evaluation.
Experience designing multi-turn, interactive, or agentic evaluation protocols.
Publications or open-source contributions in LLM evaluation at top venues.
Experience in customer-facing applied research or technical consulting.
Familiarity with safety, trustworthiness, and governance in GenAI evaluation.
Show more Show less
Similar Jobs
M
HBM PE DFT
Micron · Boise, United States, North America
N
Test Engineer - Photonic
NVIDIA · Roskilde, Denmark, Europe
N
Lead Engineer, Healthcare Data Operations and Strategy
NVIDIA · Santa Clara, United States, North America
AM
Administrative Assistant – Categorie Protette L.68/99
Applied Materials · Treviso, Italy, Europe