RG
AI Curation Data Scientist
Accepting applicationsRecruit Group · United States
Full-Time Associate AIPythonaiategan
Posted
2d ago
Category
Test
Experience
Associate
Country
United States
Our client is redefining how healthcare organizations access, trust, and act on patient data. Their AI-powered platform transforms fragmented clinical information into structured, traceable, and clinically meaningful intelligence that supports better care decisions across the healthcare ecosystem.
They are seeking an AI Curation Data Scientist to help expand the organization’s advanced health data extraction, normalization, and AI model training capabilities. This role is ideal for someone who thrives at the intersection of AI/ML engineering, healthcare interoperability, and large-scale data curation — and who wants to build systems that directly impact patient outcomes.
You’ll join a highly technical and collaborative remote team working on mission-critical initiatives involving EHR data processing, LLM training pipelines, de-identification workflows, and clinical data quality systems. The environment is fast-moving, deeply innovative, and focused on delivering reliable, production-grade AI solutions in healthcare.
What You’ll Do
Develop and optimize software pipelines for extracting and integrating structured and unstructured healthcare data
Build and maintain AI/ML workflows for data classification, normalization, and analysis
Train, fine-tune, and evaluate large language models and embedding-based systems
Curate and validate high-quality datasets used for LLM training and model improvement
Work with complex healthcare data formats including XML, JSON, FHIR, and C-CDA
Implement de-identification strategies and ensure compliance with PHI/PII handling policies
Design and execute data quality assessments, validation frameworks, and automated testing processes
Collaborate cross-functionally with engineering and product teams to improve scalability and system performance
Contribute to code repositories, testing infrastructure, and deployment best practices
Explore emerging AI methodologies and rapidly prototype innovative solutions in a highly iterative environment
What You’ll Need
Required Qualifications
Master’s degree or equivalent experience in Computer Science, Software Engineering, Statistics, Biology, or a related field
5+ years of hands-on experience in AI/ML engineering, data science, software development, or predictive analytics
Strong experience training and tuning transformer models and LLMs
Significant experience curating datasets for AI model training
Advanced Python development experience, including building extraction, classification, or NLP tools
Hands-on experience with embeddings models, sentence transformers, and modern LLM tooling
Strong experience parsing and processing complex data formats such as XML and JSON
Familiarity with healthcare interoperability standards such as FHIR and/or C-CDA
Experience with TensorFlow, PyTorch, scikit-learn, or similar ML frameworks
Proficiency with Git and software development best practices
Experience developing unit and integration tests for scientific or healthcare-focused applications
Strong communication skills and ability to collaborate effectively within remote teams
A proactive, solutions-oriented mindset with a passion for building high-impact products
Preferred Qualifications
Deep understanding of regex and advanced text-processing techniques
Experience with Unix command-line tooling such as jq, xq, sed, and bash scripting
Strong AWS experience, particularly around data storage and AI training infrastructure tradeoffs
Experience working with HIPAA, PHI/PII handling, and healthcare de-identification strategies
Experience extending or customizing open-source AI tooling
Familiarity with AI-assisted coding workflows and tools such as GitHub Copilot, Claude Code, or similar platforms
Experience working across multiple programming languages and distributed technical teams
Why This Role
Opportunity to build AI systems that directly improve healthcare outcomes
Work alongside experienced experts in AI, software systems, molecular biology, and clinical medicine
High-impact role within a fast-growing and mission-driven environment
Exposure to cutting-edge challenges in healthcare interoperability, AI model training, and clinical data engineering
Collaborative culture that values innovation, ownership, and technical excellence
Fully remote flexibility with meaningful opportunities for growth and technical leadership
Let’s Talk
If you’re excited by the opportunity to apply advanced AI and machine learning techniques to real-world healthcare challenges — while working with a highly talented and mission-driven team — we’d love to connect.
Show more Show less
They are seeking an AI Curation Data Scientist to help expand the organization’s advanced health data extraction, normalization, and AI model training capabilities. This role is ideal for someone who thrives at the intersection of AI/ML engineering, healthcare interoperability, and large-scale data curation — and who wants to build systems that directly impact patient outcomes.
You’ll join a highly technical and collaborative remote team working on mission-critical initiatives involving EHR data processing, LLM training pipelines, de-identification workflows, and clinical data quality systems. The environment is fast-moving, deeply innovative, and focused on delivering reliable, production-grade AI solutions in healthcare.
What You’ll Do
Develop and optimize software pipelines for extracting and integrating structured and unstructured healthcare data
Build and maintain AI/ML workflows for data classification, normalization, and analysis
Train, fine-tune, and evaluate large language models and embedding-based systems
Curate and validate high-quality datasets used for LLM training and model improvement
Work with complex healthcare data formats including XML, JSON, FHIR, and C-CDA
Implement de-identification strategies and ensure compliance with PHI/PII handling policies
Design and execute data quality assessments, validation frameworks, and automated testing processes
Collaborate cross-functionally with engineering and product teams to improve scalability and system performance
Contribute to code repositories, testing infrastructure, and deployment best practices
Explore emerging AI methodologies and rapidly prototype innovative solutions in a highly iterative environment
What You’ll Need
Required Qualifications
Master’s degree or equivalent experience in Computer Science, Software Engineering, Statistics, Biology, or a related field
5+ years of hands-on experience in AI/ML engineering, data science, software development, or predictive analytics
Strong experience training and tuning transformer models and LLMs
Significant experience curating datasets for AI model training
Advanced Python development experience, including building extraction, classification, or NLP tools
Hands-on experience with embeddings models, sentence transformers, and modern LLM tooling
Strong experience parsing and processing complex data formats such as XML and JSON
Familiarity with healthcare interoperability standards such as FHIR and/or C-CDA
Experience with TensorFlow, PyTorch, scikit-learn, or similar ML frameworks
Proficiency with Git and software development best practices
Experience developing unit and integration tests for scientific or healthcare-focused applications
Strong communication skills and ability to collaborate effectively within remote teams
A proactive, solutions-oriented mindset with a passion for building high-impact products
Preferred Qualifications
Deep understanding of regex and advanced text-processing techniques
Experience with Unix command-line tooling such as jq, xq, sed, and bash scripting
Strong AWS experience, particularly around data storage and AI training infrastructure tradeoffs
Experience working with HIPAA, PHI/PII handling, and healthcare de-identification strategies
Experience extending or customizing open-source AI tooling
Familiarity with AI-assisted coding workflows and tools such as GitHub Copilot, Claude Code, or similar platforms
Experience working across multiple programming languages and distributed technical teams
Why This Role
Opportunity to build AI systems that directly improve healthcare outcomes
Work alongside experienced experts in AI, software systems, molecular biology, and clinical medicine
High-impact role within a fast-growing and mission-driven environment
Exposure to cutting-edge challenges in healthcare interoperability, AI model training, and clinical data engineering
Collaborative culture that values innovation, ownership, and technical excellence
Fully remote flexibility with meaningful opportunities for growth and technical leadership
Let’s Talk
If you’re excited by the opportunity to apply advanced AI and machine learning techniques to real-world healthcare challenges — while working with a highly talented and mission-driven team — we’d love to connect.
Show more Show less