WL
Incident Engineer
Accepting applicationsWeVision LLC · Irvine, CA
Full-Time Entry AiPythonaiarmate
Posted
6d ago
Category
Test
Experience
Entry
Country
United States
Job Description for WeVision Incident Engineer (Harmony/Kafka/CoreDataPL direction)
About the Job
As an Incident Engineer supporting one of our largest clients, you will work closely with Incident Managers, global peer Incident Engineers, and client-side senior development engineers and team leads to mitigate production incidents during your shift. You will utilize incident runbooks, existing documentation, independent research, investigation, troubleshooting, testing, and prior experience to restore services efficiently. This role provides exposure to modern data processing and streaming technologies, including Kafka-based systems, while continuously expanding technical and operational expertise.
Responsibilities
Act as a first-tier incident responder by promptly acknowledging alerts, analyzing incidents, and independently identifying accurate mitigation solutions
Investigate incidents using runbooks, knowledge bases, documentation, and self-driven research
Identify root causes, clearly document findings, and capture actionable learning points for team sharing
Communicate proactively with Incident Managers and client-side senior engineers or team leads
Escalate incidents as needed with thorough technical context and analysis
Work independently on incidents while maintaining strong attention to detail and accuracy
Effectively multitask across incidents, operational tickets, and project work in a fast-paced environment
Recognize recurring patterns and connect insights across multiple incidents
Continuously expand technical knowledge through documentation, runbooks, and online resources
Prepare and present demos or knowledge-sharing sessions based on learnings
Perform routine operational tasks such as cluster or machine maintenance and access or permission management
Skills Summary
Incident Response & Production Support
Kafka & Streaming Platforms
Databricks
Distributed Systems Troubleshooting
Cloud Platforms (AWS)
Linux & Command-Line Environments
Scripting & Automation (Bash, Python, JSON)
Workflow Orchestration (Airflow)
Containerization & Kubernetes
Monitoring & Observability Tools
Technical Documentation
Cross-Functional Communication
Attention to Detail & Multitasking
Minimum Qualifications
Associate degree in Computer Science, Information Systema related technical field, or equivalent practical experience
2 years of experience in technical support, operations, QA, software development, or related technical roles
Strong analytical, troubleshooting, and communication skills
Ability to learn new technologies quickly and work independently
Quick learner, self-sufficient in learning a broad array of topics in a short amount of time based on video trainings or documentations
Interested in diving into the technical fields utilizing extensive troubleshooting skills
Uplifting, empathetic and a marvelous team player
Growth-mindset and life-long learner
Preferred Qualifications
Bachelor and above degree in a technical field or additional relevant practical experience
Hands-on experience supporting or troubleshooting distributed or streaming systems
Familiarity with Kafka or similar messaging platforms
Ability to work with multiple differing priorities in a fast-paced, constantly changing
Environment
Ability to work effectively in a fast-paced, constantly changing environment
Excellent communications, logical thinking and analytical skills
Multitasking acumen, resilient, and able to handle pressures
Nice-to-Have Qualifications
Experience with data streaming and processing technologies such as Kafka and ZooKeeper
Exposure to AWS infrastructure and cloud services
Experience with Databricks or other data warehousing platforms
Familiarity with Linux environments and command-line tools
Experience with scripting languages such as Bash, Python, or JSON
Exposure to Airflow, Kubernetes, containers, or similar orchestration technologies
Experience with monitoring, logging, or collaboration tools such as GitHub, Datadog, Grafana, Jira, or Confluence
Show more Show less
About the Job
As an Incident Engineer supporting one of our largest clients, you will work closely with Incident Managers, global peer Incident Engineers, and client-side senior development engineers and team leads to mitigate production incidents during your shift. You will utilize incident runbooks, existing documentation, independent research, investigation, troubleshooting, testing, and prior experience to restore services efficiently. This role provides exposure to modern data processing and streaming technologies, including Kafka-based systems, while continuously expanding technical and operational expertise.
Responsibilities
Act as a first-tier incident responder by promptly acknowledging alerts, analyzing incidents, and independently identifying accurate mitigation solutions
Investigate incidents using runbooks, knowledge bases, documentation, and self-driven research
Identify root causes, clearly document findings, and capture actionable learning points for team sharing
Communicate proactively with Incident Managers and client-side senior engineers or team leads
Escalate incidents as needed with thorough technical context and analysis
Work independently on incidents while maintaining strong attention to detail and accuracy
Effectively multitask across incidents, operational tickets, and project work in a fast-paced environment
Recognize recurring patterns and connect insights across multiple incidents
Continuously expand technical knowledge through documentation, runbooks, and online resources
Prepare and present demos or knowledge-sharing sessions based on learnings
Perform routine operational tasks such as cluster or machine maintenance and access or permission management
Skills Summary
Incident Response & Production Support
Kafka & Streaming Platforms
Databricks
Distributed Systems Troubleshooting
Cloud Platforms (AWS)
Linux & Command-Line Environments
Scripting & Automation (Bash, Python, JSON)
Workflow Orchestration (Airflow)
Containerization & Kubernetes
Monitoring & Observability Tools
Technical Documentation
Cross-Functional Communication
Attention to Detail & Multitasking
Minimum Qualifications
Associate degree in Computer Science, Information Systema related technical field, or equivalent practical experience
2 years of experience in technical support, operations, QA, software development, or related technical roles
Strong analytical, troubleshooting, and communication skills
Ability to learn new technologies quickly and work independently
Quick learner, self-sufficient in learning a broad array of topics in a short amount of time based on video trainings or documentations
Interested in diving into the technical fields utilizing extensive troubleshooting skills
Uplifting, empathetic and a marvelous team player
Growth-mindset and life-long learner
Preferred Qualifications
Bachelor and above degree in a technical field or additional relevant practical experience
Hands-on experience supporting or troubleshooting distributed or streaming systems
Familiarity with Kafka or similar messaging platforms
Ability to work with multiple differing priorities in a fast-paced, constantly changing
Environment
Ability to work effectively in a fast-paced, constantly changing environment
Excellent communications, logical thinking and analytical skills
Multitasking acumen, resilient, and able to handle pressures
Nice-to-Have Qualifications
Experience with data streaming and processing technologies such as Kafka and ZooKeeper
Exposure to AWS infrastructure and cloud services
Experience with Databricks or other data warehousing platforms
Familiarity with Linux environments and command-line tools
Experience with scripting languages such as Bash, Python, or JSON
Exposure to Airflow, Kubernetes, containers, or similar orchestration technologies
Experience with monitoring, logging, or collaboration tools such as GitHub, Datadog, Grafana, Jira, or Confluence
Show more Show less