TS
V&V Engineer – AI-Driven Testing & Validation
Accepting applicationsTALENT Software Services · Plano, TX
Full-Time Mid_senior AIMentorPythonaIai
Posted
29 Apr
Category
Test
Experience
Mid_senior
Country
United States
"Possible 3 Month CTH | No Fees | Do Not Re-Post| Confidential
TMR ID: YWTWG2
Role: V&V Engineer – AI-Driven Testing & Validation
Work location: Plano, TX
Background and Meet and Greet: MANDATORY
Job Description:
"Key Responsibilities
AI/ML & LLM Development/Validation
Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows.
Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance.
Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score.
Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling.
Test Automation & Frameworks
Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring.
Integrate AI testing into CI/CD pipelines, enabling automated evaluation of model updates, prompt changes, and dataset revisions before release.
Develop reusable test harnesses for prompt regression, golden-set evaluation, A/B comparison of model versions, and human-in-the-loop review workflows.
Data Quality, Bias & Fairness
Perform AI data validation across training and inference pipelines using exploratory data analysis (EDA), schema validation, and cross-validation techniques.
Conduct bias detection and fairness analysis across demographic and contextual slices to ensure responsible AI outcomes.
Drive model robustness testing, including adversarial inputs, distribution shift detection, and stress testing under edge cases.
Establish regression testing standards for retraining and fine-tuning cycles to prevent quality drift after model updates.
Collaboration & Leadership
Partner with client AI engineers to validate solutions built using TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.
Define quality KPIs and acceptance criteria for AI features, and report quality posture to engineering and product leadership.
Mentor QA engineers on AI evaluation methodologies, ML fundamentals, and modern test automation practices.
Champion responsible AI practices, including safety, transparency, explainability, and compliance with evolving AI governance standards.
Required Qualifications
10+ years of professional experience in Quality Engineering and Test Automation, validating complex enterprise applications.
Proficient in validating AI/ML systems, including Generative AI and LLM-based applications.
Strong proficiency in Python and experience building automation frameworks from the ground up.
Practical experience with prompt validation, agentic workflow testing, and AI model evaluation.
Working knowledge of evaluation metrics: BLEU, ROUGE, embedding similarity, precision, recall, F1-score, and human-evaluation methodologies.
Experience with AI/ML frameworks and ecosystems: TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.
Solid understanding of data validation techniques: EDA, schema validation, cross-validation, and statistical analysis.
Experience integrating automated testing into CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
Familiarity with bias detection, fairness assessment, and AI safety evaluation techniques.
Preferred Qualifications
Experience with vector databases, retrieval-augmented generation (RAG), and embedding pipelines.
Background in MLOps tooling such as MLflow, Weights & Biases, or similar experiment tracking platforms.
Exposure to LLM observability and evaluation tools (e.g., LangSmith, Ragas, DeepEval, TruLens).
Familiarity with cloud AI services on AWS, Azure, or GCP (Bedrock, Azure OpenAI, Vertex AI).
Knowledge of AI governance frameworks, model cards, and emerging AI regulatory standards.
Bachelor's or Master's degree in Computer Science, Data Science, or a related technical field."
The following details must accompany your submission:
First Name, Middle name, and Last Name:
City and State:
Open to Relocate?
Rate:
Availability:
Phone #:
Mobile #:
Email address:
Visa type:
Visa Expiration Date:
Hiring Status:
MiguelAngel Buonafina - ERM
North America
Tel.: +***"
Show more Show less
TMR ID: YWTWG2
Role: V&V Engineer – AI-Driven Testing & Validation
Work location: Plano, TX
Background and Meet and Greet: MANDATORY
Job Description:
"Key Responsibilities
AI/ML & LLM Development/Validation
Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows.
Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance.
Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score.
Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling.
Test Automation & Frameworks
Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring.
Integrate AI testing into CI/CD pipelines, enabling automated evaluation of model updates, prompt changes, and dataset revisions before release.
Develop reusable test harnesses for prompt regression, golden-set evaluation, A/B comparison of model versions, and human-in-the-loop review workflows.
Data Quality, Bias & Fairness
Perform AI data validation across training and inference pipelines using exploratory data analysis (EDA), schema validation, and cross-validation techniques.
Conduct bias detection and fairness analysis across demographic and contextual slices to ensure responsible AI outcomes.
Drive model robustness testing, including adversarial inputs, distribution shift detection, and stress testing under edge cases.
Establish regression testing standards for retraining and fine-tuning cycles to prevent quality drift after model updates.
Collaboration & Leadership
Partner with client AI engineers to validate solutions built using TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.
Define quality KPIs and acceptance criteria for AI features, and report quality posture to engineering and product leadership.
Mentor QA engineers on AI evaluation methodologies, ML fundamentals, and modern test automation practices.
Champion responsible AI practices, including safety, transparency, explainability, and compliance with evolving AI governance standards.
Required Qualifications
10+ years of professional experience in Quality Engineering and Test Automation, validating complex enterprise applications.
Proficient in validating AI/ML systems, including Generative AI and LLM-based applications.
Strong proficiency in Python and experience building automation frameworks from the ground up.
Practical experience with prompt validation, agentic workflow testing, and AI model evaluation.
Working knowledge of evaluation metrics: BLEU, ROUGE, embedding similarity, precision, recall, F1-score, and human-evaluation methodologies.
Experience with AI/ML frameworks and ecosystems: TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.
Solid understanding of data validation techniques: EDA, schema validation, cross-validation, and statistical analysis.
Experience integrating automated testing into CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
Familiarity with bias detection, fairness assessment, and AI safety evaluation techniques.
Preferred Qualifications
Experience with vector databases, retrieval-augmented generation (RAG), and embedding pipelines.
Background in MLOps tooling such as MLflow, Weights & Biases, or similar experiment tracking platforms.
Exposure to LLM observability and evaluation tools (e.g., LangSmith, Ragas, DeepEval, TruLens).
Familiarity with cloud AI services on AWS, Azure, or GCP (Bedrock, Azure OpenAI, Vertex AI).
Knowledge of AI governance frameworks, model cards, and emerging AI regulatory standards.
Bachelor's or Master's degree in Computer Science, Data Science, or a related technical field."
The following details must accompany your submission:
First Name, Middle name, and Last Name:
City and State:
Open to Relocate?
Rate:
Availability:
Phone #:
Mobile #:
Email address:
Visa type:
Visa Expiration Date:
Hiring Status:
MiguelAngel Buonafina - ERM
North America
Tel.: +***"
Show more Show less
Similar Jobs
M
HBM PE DFT
Micron · Boise, United States, North America
N
Test Engineer - Photonic
NVIDIA · Roskilde, Denmark, Europe
N
Lead Engineer, Healthcare Data Operations and Strategy
NVIDIA · Santa Clara, United States, North America
AM
Administrative Assistant – Categorie Protette L.68/99
Applied Materials · Treviso, Italy, Europe