CJ

AI/ML Research Scientist, LLM Post-Training & Evaluation

Accepting applications

ChatGPT Jobs · Palo Alto, CA

Full-Time Mid_senior AIMachine LearningPythonaiate
Posted
17 Apr
Category
Test
Experience
Mid_senior
Country
United States
Job Description

Job Description

Research Scientist, LLM Evaluation & Post-Training

Company: Centific

Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)

Type: Full-time

Salary: $150K - $160K Annually

About The Role

Centific is seeking a Research Scientist focused on LLM evaluation and post-training. This role involves defining and executing research agendas, developing evaluation frameworks, analyzing model behavior, and collaborating with cross-functional teams and customer stakeholders. The goal is to improve LLM evaluation methodologies and drive advancements in AI deployment.

Key Responsibilities

Define and execute research on LLM evaluation and post-training.
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems.
Lead research in frontier evaluation domains (long-context, cross-modal, dynamic multi-turn).
Analyze model behavior and provide recommendations for improvement.
Collaborate with data scientists and ML engineers on evaluation and training pipelines.
Engage with customer technical stakeholders to understand evaluation goals and provide recommendations.
Contribute to knowledge creation through datasets, frameworks, reports, and publications.
Promote thought leadership in LLM evaluation and post-training.

Required Qualifications

MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or related quantitative field (PhD preferred).
5+ years of relevant experience in applied ML research, with substantial work in LLMs or foundation models.
Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
Strong Python coding skills for research, data processing, and ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow).
Ability to evaluate and compare human and automated evaluation methods.
Strong written and verbal communication skills.

Preferred Qualifications

Hands-on experience with fine-tuning or post-training experiments (SFT, preference optimization, RLHF/RLAIF).
Experience with multimodal and long-context evaluation.
Experience designing multi-turn, interactive, or agentic evaluation protocols.
Publications or open-source contributions in LLM evaluation at top venues.
Experience in customer-facing applied research or technical consulting.
Familiarity with safety, trustworthiness, and governance in GenAI evaluation.
Show more Show less