19 may
|
IL Chile
|
Santiago
Postúlate en Kit Empleo: kitempleo.cl/empleo/1csfmb
Job Description
– Site Reliability Engineer (SRE)Role PurposeThe Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of digital services in production, balancing service stability with the ability to deliver change at speed.The role focuses on strengthening operational resilience through engineering, automation, and proactive reliability practices, working closely with application and platform teams.Scope of the RoleThe locally applied SRE role covers:Production digital services (applications, platforms, data products)Associated infrastructure (cloud, CI/CD pipelines, integrations)Continuous operations (24/7 reliability mindset, not necessarily shift-based)Production changes (deployments, configurations)Incidents, problems, and service degradationsContinuous improvement of stability and operational efficiencyKey ResponsibilitiesService Reliability & AvailabilityDefine, implement, and maintain SLIs and SLOs(availability, latency, error rates)Continuously monitor service health and anticipate degradationsEnsure services operate within business‑agreed reliability thresholdsManage reliability trade‑offs between speed and stabilityIncident & Problem ManagementLead or coordinate response to relevant incidents (L2/L3)Ensure: Rapid and structured diagnosisSafe service restorationClear and effective communicationFacilitate blameless postmortemsConvert recurring incidents into engineering improvement backlogDrive long‑term remediation rather than reactive firefightingAutomation & Operational ExcellenceIdentify repetitive and manual operational tasksDesign and implement automation for:
DeploymentsMonitoring and alertingHealth checksBasic recovery and self‑healing (where applicable)Reduce toil and increase system resilience through engineering solutionsChange Governance & Production ReadinessSupport vendor and internal team change trackingEnsure changes: Are traceableHave defined rollback strategiesMinimize operational riskValidate operational readiness before productionParticipate early in solution and architecture design from a reliability perspective (early involvement)Metrics, Observability & Continuous ImprovementDefine and maintain near real‑time operational KPIs (“service pulse”)Ensure every deviation has: Clear ownershipDefined corrective actionsPrevent reactive operations by driving data‑driven decision makingSupport identification, prioritization, and planning of technical debt remediationWhat This Role Is NotThe SRE role will not be:A dedicated incident operator onlyAn advanced Service DeskThe sole owner of service stability (reliability is shared)A gatekeeper blocking changes without technical justificationThe owner of contractual MOPsA commercial or account management roleThe customer-side account or delivery leadExperience & Profile (Indicative)Proven experience as SRE, Production Engineer, or similar roleStrong background in production systems and reliability engineeringExperience working with: Cloud platformsCI/CD pipelinesMonitoring and observability toolsComfortable operating in product‑oriented or POD-based team modelsStrong problem‑solving, communication, and collaboration skillsOperating Model AlignmentWorks embedded or as an enabling function with PODsFocused on enablement and reliability patterns, not centralized controlPromotes shared ownership of reliability EEO Infosys provides equal employment opportunities to applicants and employees without regard to race; color; sex; gender identity; sexual orientation; religious practices and observances; national origin; pregnancy, childbirth, or related medical conditions; or disability.
Required Skill Profession
Computer Occupations
Postúlate en Kit Empleo: kitempleo.cl/empleo/1csfmb
📌 Site Reliability Engineer (SRE) (Santiago)
🏢 IL Chile
📍 Santiago