About the role

Site Reliability Engineer responsible for operation, reliability, and performance of platform infrastructure, ensuring high availability, scalability, and operational excellence.

BankingOnsite

Key Responsibilities

Lead incident response for complex virtualization, storage, or OS-level disruptions and conduct blameless post-mortems and Root Cause Analysis (RCA).
Develop and maintain software tools (Python, PowerShell, Java) for automation of infrastructure tasks via CI/CD pipelines.
Architect and manage AI-first monitoring systems (Grafana, ELK) for predictive failure detection.
Define and measure infrastructure-specific SLIs and SLOs (e.g., IOPS, disk latency, OS uptime) and manage error budgets.
Adopt and maintain Infrastructure as Code (IaC) using Terraform, Ansible for consistent deployments.

Requirements

Degree in Computer Science, Information Technology, or related Engineering field.
At least 5 years of relevant experience.
Intermediate-level administration of Windows Server (Active Directory, Clustering) and Redhat Linux.
High proficiency in hypervisors (VMware) and enterprise storage architecture (SAN, NAS, S3).
Proficiency in Python and PowerShell for automation.
Hands-on experience with Grafana and Elasticsearch (ELK) for monitoring.

Site Reliability (Infrastructure) Engineer (AVP/VP)

About the role

Key Responsibilities

Requirements