Campus Pride Jobs

Mobile Campus Pride Logo

Job Information

COTIVITI, INC. Site Reliability Engineer in SOUTH JORDAN, Utah

Site Reliability Engineer Job Locations

US-Remote ID

2024-13875

Category Engineering/IT  

Position Type Full-Time Overview

The Site Reliability Engineer (SRE) is responsible for leading the continuous evolution of the capabilities needed to ensure the reliable delivery and operation of the software solutions which enable Cotiviti's ability to retrieve medical records from healthcare providers.

The SRE works closely with architects, development teams, production operations, and product owners to enable the appropriate level of reliability to meet business objectives.  The SRE serves as a mentor and role model-providing thought-leadership and collaborating with a cross-functional team to drive the continuous improvement of SDLC and production operations. They improve reliability by focusing on monitoring, productivity, performance, and availability.

The SRE has three primary areas of responsibility: Operations: emergency incident response; change management; infrastructure management * System support: ensure system stability; production operations enablement * Process improvement: post-incident reviews; improve software development, deployment, and release practices; improve support practices; recommend changes to solution architecture Collaborating with stakeholders, the SRE: defines business-aligned Service Level Indicators and Objectives; implements capabilities which real-time insight into the health of applications and the development pipelines; implements process and technology changes; automates routine SDLC tasks.

The SRE possesses a deep understanding of AWS cloud-native services; they ensure the team employs the correct strategies and tactics to ensure the reliability of the applications and services operating on AWS.

Responsibilities

  • Ability to translate functional and nonfunctional requirements and strategies into solution reliability strategy, architecture, and roadmap in collaboration with development team members and other architects.
  • Ability to define key business-value aligned Service Level Indicators and Objectives. Automate SLIs/SLOs through observability tools.
  • Ability to lead data-driven improvement in reliability of the software solution.
  • Ability to apply SRE principles and practices to solutions built using AWS cloud-native services, such as but not limited to:
  • API Gateways
  • Lambda functions built using NestJS/NodeJS
  • Datastores (DynamoDB, OpenSearch, RDS, s3, HealthLake)
  • Event messaging technologies (SQS, EventBridge, Kinesis)
  • Logging/Tracing (CloudWatch, X-Ray)
  • Infrastructure as Code (Terraform)
  • Ability to drive continuous process and technology improvements to increase the reliability of deployments and releases.
  • Coaching/training development team members as necessary to drive improvements in the teams' delivery of the solution.
  • Support the continuous evolution of best practices and standards for solution reliability
  • Complete all responsibilities as outlined on annual Performance Plan.

Qualifications

  • Proven record of accomplishment of applying SRE principles and practices to drive reliable software delivery and operation.
  • Self-starter with a passion for delivering reliable, mission-critical solutions which delight customers.
  • Expert in applying process improvement methodologies (Lean, Six Sigma, Kaizen, etc.) to software engineering practices.
  • Bachelor's degree in Computer Science, Information Technology or related field, or equivalent work experience.
  • 10+ years of experience in at least two IT disciplines (such as data/solution architecture, Technical/Infrastructure architecture, Information/Data Architecture and Business Architec ture) in a multitier enterprise environment. *... For full info follow application link.

Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities

DirectEmployers