S
Sr. Reliability Engineer, Digital Commerce
Accepting applicationsSkechers · Manhattan Beach, CA
Full-Time Mid_senior AIaiaterf
Posted
3d ago
Category
Test
Experience
Mid_senior
Country
United States
WHO WE ARE:
Headquartered in Southern California, Skechers—the Comfort Technology Company®—has spent over 30 years helping men, women, and kids everywhere look and feel good. Comfort innovation is at the core of everything we do, driving the development of stylish, high-quality products at a great value. From our diverse footwear collections to our expanding range of apparel and accessories, Skechers is a complete lifestyle brand.
ABOUT THE ROLE:
The Sr. Reliability Engineer, Digital Commerce is responsible for ensuring the stability, performance, and operational readiness of the global digital commerce ecosystem. This role owns end-to-end reliability of the customer shopping journey – from storefront experience and product discovery through checkout, order lifecycle, and commerce integrations – with a specific focus on the Salesforce Commerce Cloud (SFCC) ecosystem including B2C Commerce storefronts, integrations, and commerce services.
Working at the intersection of engineering, product, and operations, this engineer drives proactive reliability practices, observability standards, incident management discipline, and automation initiatives that reduce operational risk and strengthen digital commerce resilience at global scale.
WHAT YOU’LL DO:
Commerce Platform Reliability
Own end-to-end operational reliability across the digital commerce stack, including storefront availability, product catalog and pricing services, search and discovery, checkout and payment processing, order lifecycle, and fulfillment integrations (OMS, WMS, payment gateways, tax, fraud, and shipping).
Ensure stability and performance of the Salesforce Commerce Cloud (SFCC) ecosystem, including Business Manager configurations, WebDAV operations, replication processes, cartridge-based customization layers, and headless/microservice components integrated with SFCC.
Establish operational standards and reliability guardrails for commerce services and all dependent systems across varying traffic conditions, including peak demand periods.
Partner with order management teams to ensure reliability across Manhattan Active Order Management (MAO) order routing, fulfillment execution integrations, and downstream fulfillment event integrity, including BOPIS flows.
Observability & Monitoring
Design and implement monitoring frameworks across digital commerce services, with proactive detection of conversion-impacting issues before they affect customers.
Define and manage SLIs, SLOs, and alerting strategies tied to business impact including conversion degradation, checkout failure rates, order placement success, and site performance and latency.
Build operational dashboards that translate technical signals into revenue and customer experience insights.
Implement monitoring across SFCC-specific signals including pipeline performance, OCAPI health, SCAPI latency, cache effectiveness, replication health, third-party integration response times, and MAO order orchestration signals such as routing latency, fulfillment status synchronization, and exception queue health.
Incident Management & Operational Readiness
Lead coordination of high-severity commerce incidents, including triage, root cause analysis, systemic remediation planning, and improved MTTR through automation, tooling, and process optimization.
Establish and maintain incident runbooks, operational playbooks, and continuous operational readiness standards across commerce platforms.
Own operational readiness and release planning for major commerce launches, campaigns, and seasonal peak events, including SFCC traffic scaling strategy validation.
Partner with Salesforce Commerce Cloud support during platform incidents, managing severity escalation processes and coordinating internal response during platform-level disruptions.
Performance & Scalability Engineering
Identify and remediate performance bottlenecks impacting site speed, checkout latency, and service responsiveness, including SFCC-specific optimization across page caching, CDN configuration, search indexing, and cartridge execution efficiency.
Partner with engineering teams to drive performance optimization initiatives, support load testing, and own capacity planning and peak readiness validation.
Ensure commerce systems scale reliably to support business growth and global expansion.
Automation & Reliability Engineering
Develop automation to reduce manual operational effort and recurring incident classes, including SFCC deployment validation, replication monitoring, integration failure detection, and release risk scoring.
Implement reliability engineering patterns such as automated recovery workflows, self-healing service orchestration, reliability validation pipelines, and operational health scoring.
Drive adoption of reliability engineering best practices across delivery teams.
Cross-Functional Collaboration
Partner with product, engineering, merchandising, marketing, and operations teams to align reliability priorities with business objectives, serving as a reliability advocate during architecture design and solution reviews.
Act as the reliability liaison between internal commerce engineering teams and Salesforce Commerce Cloud platform teams, coordinating with external vendors and SaaS providers during incident resolution and performance optimization.
Translate technical reliability risks into clear business impact narratives for both technical and non-technical stakeholders.
WHAT YOU’LL BRING:
Hands-on experience supporting Salesforce Commerce Cloud (SFCC) production environments, including composable commerce ecosystems integrating SFCC with CMS, search, personalization, and middleware platforms.
Experience supporting high-traffic global eCommerce environments with modern commerce architectures including headless, composable, and microservices-based platforms.
Strong background in incident management, observability, and operational excellence practices, with hands-on experience with observability platforms such as Datadog.
Familiarity with order management systems, payment platforms (such as Cybersource or Adyen), or commerce SaaS ecosystems; exposure to Manhattan Active Order Management (MAO) is a strong plus.
Experience with CI/CD pipelines, deployment strategies, release governance, APIs, event-driven systems, and commerce integrations.
Strong understanding of distributed systems, cloud-native infrastructure, and performance optimization for web applications and backend services.
Experience leveraging AI-assisted engineering tools to improve operational efficiency and automation.
Strong analytical mindset with the ability to connect technical reliability to business outcomes and communicate effectively with both technical and non-technical stakeholders.
REQUIREMENTS:
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
7+ years in Site Reliability Engineering, Production Engineering, or Digital Commerce Platform Operations.
This is a hybrid role based in Manhattan Beach, CA, requiring a minimum of 3 days onsite per week.
Show more Show less
Headquartered in Southern California, Skechers—the Comfort Technology Company®—has spent over 30 years helping men, women, and kids everywhere look and feel good. Comfort innovation is at the core of everything we do, driving the development of stylish, high-quality products at a great value. From our diverse footwear collections to our expanding range of apparel and accessories, Skechers is a complete lifestyle brand.
ABOUT THE ROLE:
The Sr. Reliability Engineer, Digital Commerce is responsible for ensuring the stability, performance, and operational readiness of the global digital commerce ecosystem. This role owns end-to-end reliability of the customer shopping journey – from storefront experience and product discovery through checkout, order lifecycle, and commerce integrations – with a specific focus on the Salesforce Commerce Cloud (SFCC) ecosystem including B2C Commerce storefronts, integrations, and commerce services.
Working at the intersection of engineering, product, and operations, this engineer drives proactive reliability practices, observability standards, incident management discipline, and automation initiatives that reduce operational risk and strengthen digital commerce resilience at global scale.
WHAT YOU’LL DO:
Commerce Platform Reliability
Own end-to-end operational reliability across the digital commerce stack, including storefront availability, product catalog and pricing services, search and discovery, checkout and payment processing, order lifecycle, and fulfillment integrations (OMS, WMS, payment gateways, tax, fraud, and shipping).
Ensure stability and performance of the Salesforce Commerce Cloud (SFCC) ecosystem, including Business Manager configurations, WebDAV operations, replication processes, cartridge-based customization layers, and headless/microservice components integrated with SFCC.
Establish operational standards and reliability guardrails for commerce services and all dependent systems across varying traffic conditions, including peak demand periods.
Partner with order management teams to ensure reliability across Manhattan Active Order Management (MAO) order routing, fulfillment execution integrations, and downstream fulfillment event integrity, including BOPIS flows.
Observability & Monitoring
Design and implement monitoring frameworks across digital commerce services, with proactive detection of conversion-impacting issues before they affect customers.
Define and manage SLIs, SLOs, and alerting strategies tied to business impact including conversion degradation, checkout failure rates, order placement success, and site performance and latency.
Build operational dashboards that translate technical signals into revenue and customer experience insights.
Implement monitoring across SFCC-specific signals including pipeline performance, OCAPI health, SCAPI latency, cache effectiveness, replication health, third-party integration response times, and MAO order orchestration signals such as routing latency, fulfillment status synchronization, and exception queue health.
Incident Management & Operational Readiness
Lead coordination of high-severity commerce incidents, including triage, root cause analysis, systemic remediation planning, and improved MTTR through automation, tooling, and process optimization.
Establish and maintain incident runbooks, operational playbooks, and continuous operational readiness standards across commerce platforms.
Own operational readiness and release planning for major commerce launches, campaigns, and seasonal peak events, including SFCC traffic scaling strategy validation.
Partner with Salesforce Commerce Cloud support during platform incidents, managing severity escalation processes and coordinating internal response during platform-level disruptions.
Performance & Scalability Engineering
Identify and remediate performance bottlenecks impacting site speed, checkout latency, and service responsiveness, including SFCC-specific optimization across page caching, CDN configuration, search indexing, and cartridge execution efficiency.
Partner with engineering teams to drive performance optimization initiatives, support load testing, and own capacity planning and peak readiness validation.
Ensure commerce systems scale reliably to support business growth and global expansion.
Automation & Reliability Engineering
Develop automation to reduce manual operational effort and recurring incident classes, including SFCC deployment validation, replication monitoring, integration failure detection, and release risk scoring.
Implement reliability engineering patterns such as automated recovery workflows, self-healing service orchestration, reliability validation pipelines, and operational health scoring.
Drive adoption of reliability engineering best practices across delivery teams.
Cross-Functional Collaboration
Partner with product, engineering, merchandising, marketing, and operations teams to align reliability priorities with business objectives, serving as a reliability advocate during architecture design and solution reviews.
Act as the reliability liaison between internal commerce engineering teams and Salesforce Commerce Cloud platform teams, coordinating with external vendors and SaaS providers during incident resolution and performance optimization.
Translate technical reliability risks into clear business impact narratives for both technical and non-technical stakeholders.
WHAT YOU’LL BRING:
Hands-on experience supporting Salesforce Commerce Cloud (SFCC) production environments, including composable commerce ecosystems integrating SFCC with CMS, search, personalization, and middleware platforms.
Experience supporting high-traffic global eCommerce environments with modern commerce architectures including headless, composable, and microservices-based platforms.
Strong background in incident management, observability, and operational excellence practices, with hands-on experience with observability platforms such as Datadog.
Familiarity with order management systems, payment platforms (such as Cybersource or Adyen), or commerce SaaS ecosystems; exposure to Manhattan Active Order Management (MAO) is a strong plus.
Experience with CI/CD pipelines, deployment strategies, release governance, APIs, event-driven systems, and commerce integrations.
Strong understanding of distributed systems, cloud-native infrastructure, and performance optimization for web applications and backend services.
Experience leveraging AI-assisted engineering tools to improve operational efficiency and automation.
Strong analytical mindset with the ability to connect technical reliability to business outcomes and communicate effectively with both technical and non-technical stakeholders.
REQUIREMENTS:
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
7+ years in Site Reliability Engineering, Production Engineering, or Digital Commerce Platform Operations.
This is a hybrid role based in Manhattan Beach, CA, requiring a minimum of 3 days onsite per week.
Show more Show less
Similar Jobs
AM
Material Handler III
Applied Materials · Austin, United States, North America
N
Developer Advocate – Robotics and Physical AI
NVIDIA · Santa Clara, United States, North America
N
Principal Machine Learning Engineer, Accelerated Apache Spark
NVIDIA · Santa Clara, United States, North America
AD
Counsel, Corporate Attorney
Analog Devices · Wilmington, United States, North America