Join our Talent Network
Skip to main content

Principal II, Site Reliability Engineering

This job posting is no longer active.

Category: Global Technology Services
Position Type: Regular Full-Time
External ID: 9492
Location: Torrance, CA, United States
Date Posted: Oct 5, 2023
Hiring Range: 153,100.00 to 168,700.00 USD Annually




The Principal II Site Reliability Engineer acts as a technical guide applying engineering techniques to automate manual repeatable operational work, partnering with Application Development and Infrastructure Teams to architect and operate reliable, scalable, and performant software/services.



  • Partner with application developers and solution architects to ensure services are built for scale and performance.
  • Lead setting service-level objectives, agreements and indicators (SLOs, SLAs and SLIs) for the underlying service by collaborating with Application Development, Product and Business Owners
  • Design, Develop and build Scripts/Software/Tools that will improve the reliability of systems in Production including fixing issue, responding to incidents and taking on-call responsibilities.
  • Improve the overall resilience of a system and provide awareness of the health and performance of services across all applications and infrastructure
  • Improve service performance metrics like latency, page load speed and ETL and help proactively identify performance issues across the system
  • Implement monitoring solutions, build Dashboards and Alerts based on four golden signals of SRE providing single source to resolve overall performance and availability of the services they support.
  • Spread information across DevOps and business teams – encouraging a blameless culture passionate about workflow visibility and collaboration
  • Root-cause analysis sophisticated problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
  • Services as technical owner to ensures delivery for SRE initiative
  • Performs result reviews and coaches team in area of expertise in SRE
  • Provide continuous driven and best-practices research, leverage industry resources and market trends, and liaise with internal team members.
  • Escalates risks and resolves issues to enable team delivery 
  • Helps to foster a fun, collaborative and supportive culture in which we are able to make career defining work.
  • Ensures team delivers high quality, accurate, viable, and reliable products




  • Experience working with Linux & Windows OS along with Scripting experience using PowerShell, Python, Linux/Unix Shell Scripting
  • Experience with Monitoring and Logging Tools – Splunk, Dynatrace, Azure Monitoring, Datadog, Prometheus with Grafana
  • Experience working with DevOps Automation tools - Azure DevOps, GitHub, GitHub Actions, SonarQube, Artifactory, Google Cloud Build, Cloud Deploy, Argo CD/Flux
  • Experience with Public Cloud Platforms – Azure, GCP
  • Experience with Docker, Kubernetes (AKS, GKE), Helm, Service Mesh
  • Experience with Google Anthos, Apigee, Confluent Kafka, MongoDB, SQL and Oracle Databases
  • Experience with Microservices Architectures
  • Experience with Infrastructure as Code automation tools - Terraform, Ansible
  • An understanding of programming languages such as C#, Ruby, Perl, Java, Go, Python and PHP
  • Excellent written and verbal communication skills
  • Ability to communicate effectively to technical and executive audiences
  • Company renowned for technical expertise in one area within Release Management 
  • Provides SME support in area of expertise
  • Creative problem solving and innovation
  • Provide technical leadership and vision

Certificates / Training:

  • Azure / Google Cloud Certifications
  • AZ-400: Designing and Implementing Microsoft DevOps Solutions
  • Google Cloud Professional Cloud DevOps Engineer
  • Certified Kubernetes Administrator (CKA) / Certified Kubernetes Application Developer (CKAD)


  • Good understanding of Application Security Architectures and Guidance
  • Knowledge of threat modelling and risk assessment techniques
  • Knowledge of cybersecurity threats, current best practices and latest software
  • Experience in configuration of Web Application Firewall Rules using Akamai


  • 7 + years experience in Release Management with deep expertise in one area



  • Bachelor's in Computer Science or equivalent combination of experience may be considered in lieu of education. 


  • Advanced Technical Degree





At Herbalife, we value doing what’s right. We are proud to be an equal opportunity employer, making decisions without regard to race, color, religion, sex, sexual orientation, gender identity, marital status, national origin, age, veteran status, disability, or any other protected characteristic. We value diversity, strive for inclusivity, and believe the differences among our teammates is a key contributor to Herbalife’s ongoing success.


Herbalife offers a variety of benefits to eligible employees in the U.S. (limited to the 50 States and the District of Columbia), which includes Group Health Programs, other Voluntary Benefit Programs, and Paid Time Off. Group Health Programs include Medical, Dental, Vision, Health Savings Account (HSA), Flexible Spending Accounts (FSA), Basic Life/AD&D; Short-Term and Long-Term Disability and an Employee Assistance Program (EAP).


Other Voluntary Benefit Programs include a 401(k) plan, Wellness Incentive Program, Employee Stock Purchase Plan (ESPP), Supplemental Life/Critical Illness/Hospitalization/Accident Insurance, and Pet Insurance. Paid time off includes Company-observed U.S. Holidays, Floating Holidays, Vacation, Sick Time, a Volunteer Program, Paid Maternity and Paternity Leave, Bereavement Leave, Personal Leave and time off for voting.


If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please email your request to [email protected].

Share: share to e-mail

Similar Jobs