19 may
|
Infosys
|
Santiago
Postúlate en Kit Empleo: kitempleo.cl/empleo/1cr3ei
Job Description – Site Reliability Engineer (SRE)
Role Purpose: The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of digital services in production, balancing service stability with the ability to deliver change at speed. The role focuses on strengthening operational resilience through engineering, automation, and proactive reliability practices, working closely with application and platform teams.
Scope of the Role
- Production digital services (applications, platforms, data products)
- Associated infrastructure (cloud, CI/CD pipelines, integrations)
- Continuous operations (24/7 reliability mindset, not necessarily shift-based)
- Production changes (deployments, configurations)
- Incidents, problems, and service degradations
- Continuous improvement of stability and operational efficiency
Key Responsibilities
Service Reliability & Availability
- Define, implement, and maintain SLIs and SLOs (availability, latency, error rates)
- Continuously monitor service health and anticipate degradations
- Ensure services operate within business‑agreed reliability thresholds
- Manage reliability trade‑offs between speed and stability
Incident & Problem Management
- Lead or coordinate response to relevant incidents (L2/L3)
- Ensure rapid and structured diagnosis, safe service restoration, clear and effective communication
- Facilitate blameless postmortems
- Convert recurring incidents into engineering improvement backlog
- Drive long‑term remediation rather than reactive firefighting
Automation & Operational Excellence
- Identify repetitive and manual operational tasks
- Design and implement automation for deployments, monitoring and alerting, health checks, basic recovery and self‑healing (where applicable)
- Reduce toil and increase system resilience through engineering solutions
Change Governance & Production Readiness
- Support vendor and internal team change tracking
- Ensure changes are traceable, have defined rollback strategies and minimize operational risk
- Validate operational readiness before production
- Participate early in solution and architecture design from a reliability perspective
Metrics, Observability & Continuous Improvement
- Define and maintain near real‑time operational KPIs (service pulse)
- Ensure every deviation has clear ownership and defined corrective actions
- Prevent reactive operations by driving data‑driven decision making
- Support identification, prioritization, and planning of technical debt remediation
What This Role Is Not
- A dedicated incident operator only
- An advanced Service Desk
- The sole owner of service stability (reliability is shared)
- A gatekeeper blocking changes without technical justification
- The owner of contractual MOPs
- A commercial or account management role
- The customer‑side account or delivery lead
Experience & Profile (Indicative)
- Proven experience as SRE, Production Engineer, or similar role
- Strong background in production systems and reliability engineering
- Experience working with cloud platforms, CI/CD pipelines, monitoring and observability tools
- Comfortable operating in product‑oriented or POD‑based team models
- Strong problem‑solving, communication, and collaboration skills
Operating Model Alignment
- Works embedded or as an enabling function with PODs
- Focused on enablement and reliability patterns, not centralized control
- Promotes shared ownership of reliability
EEO
Infosys provides equal employment opportunities to applicants and employees without regard to race; color; sex; gender identity; sexual orientation; religious practices and observances; national origin; pregnancy, childbirth, or related medical conditions; or disability.
#J-18808-Ljbffr
Postúlate en Kit Empleo: kitempleo.cl/empleo/1cr3ei
📌 Site Reliability Engineer (SRE) (Santiago)
🏢 Infosys
📍 Santiago