"Possible 3 Month CTH | No Fees | Do Not Re-Post| Confidential

TMR ID: YWTWG2

Role: V&V Engineer – AI-Driven Testing & Validation

Work location: Plano, TX

Background and Meet and Greet: MANDATORY

Job Description:

"Key Responsibilities

AI/ML & LLM Development/Validation

Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows.

Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance.

Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score.

Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling.

Test Automation & Frameworks

Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring.

Integrate AI testing into CI/CD pipelines, enabling automated evaluation of model updates, prompt changes, and dataset revisions before release.

Develop reusable test harnesses for prompt regression, golden-set evaluation, A/B comparison of model versions, and human-in-the-loop review workflows.

Data Quality, Bias & Fairness

Perform AI data validation across training and inference pipelines using exploratory data analysis (EDA), schema validation, and cross-validation techniques.

Conduct bias detection and fairness analysis across demographic and contextual slices to ensure responsible AI outcomes.

Drive model robustness testing, including adversarial inputs, distribution shift detection, and stress testing under edge cases.

Establish regression testing standards for retraining and fine-tuning cycles to prevent quality drift after model updates.

Collaboration & Leadership

Partner with client AI engineers to validate solutions built using TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.

Define quality KPIs and acceptance criteria for AI features, and report quality posture to engineering and product leadership.

Mentor QA engineers on AI evaluation methodologies, ML fundamentals, and modern test automation practices.

Champion responsible AI practices, including safety, transparency, explainability, and compliance with evolving AI governance standards.

Required Qualifications

10+ years of professional experience in Quality Engineering and Test Automation, validating complex enterprise applications.

Proficient in validating AI/ML systems, including Generative AI and LLM-based applications.

Strong proficiency in Python and experience building automation frameworks from the ground up.

Practical experience with prompt validation, agentic workflow testing, and AI model evaluation.

Working knowledge of evaluation metrics: BLEU, ROUGE, embedding similarity, precision, recall, F1-score, and human-evaluation methodologies.

Experience with AI/ML frameworks and ecosystems: TensorFlow, PyTorch, LangChain, LangGraph, and LlamaIndex.

Solid understanding of data validation techniques: EDA, schema validation, cross-validation, and statistical analysis.

Experience integrating automated testing into CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI, Azure DevOps).

Familiarity with bias detection, fairness assessment, and AI safety evaluation techniques.

Preferred Qualifications

Experience with vector databases, retrieval-augmented generation (RAG), and embedding pipelines.

Background in MLOps tooling such as MLflow, Weights & Biases, or similar experiment tracking platforms.

Exposure to LLM observability and evaluation tools (e.g., LangSmith, Ragas, DeepEval, TruLens).

Familiarity with cloud AI services on AWS, Azure, or GCP (Bedrock, Azure OpenAI, Vertex AI).

Knowledge of AI governance frameworks, model cards, and emerging AI regulatory standards.

Bachelor's or Master's degree in Computer Science, Data Science, or a related technical field."

The following details must accompany your submission:

First Name, Middle name, and Last Name:

City and State:

Open to Relocate?

Rate:

Availability:

Phone #:

Mobile #:

Email address:

Visa type:

Visa Expiration Date:

Hiring Status:

MiguelAngel Buonafina - ERM

North America

Tel.: +***"
Show more Show less

V&V Engineer – AI-Driven Testing & Validation

Similar Jobs