OCBC

Site Reliability (Infrastructure) Engineer (AVP/VP)

OCBC
BankingOCBC SingaporeOnsitePosted 4 weeks ago

About the role

Site Reliability Engineer responsible for operation, reliability, and performance of platform infrastructure, ensuring high availability, scalability, and operational excellence.

BankingOnsite

Key Responsibilities

  • Lead incident response for complex virtualization, storage, or OS-level disruptions and conduct blameless post-mortems and Root Cause Analysis (RCA).
  • Develop and maintain software tools (Python, PowerShell, Java) for automation of infrastructure tasks via CI/CD pipelines.
  • Architect and manage AI-first monitoring systems (Grafana, ELK) for predictive failure detection.
  • Define and measure infrastructure-specific SLIs and SLOs (e.g., IOPS, disk latency, OS uptime) and manage error budgets.
  • Adopt and maintain Infrastructure as Code (IaC) using Terraform, Ansible for consistent deployments.

Requirements

  • Degree in Computer Science, Information Technology, or related Engineering field.
  • At least 5 years of relevant experience.
  • Intermediate-level administration of Windows Server (Active Directory, Clustering) and Redhat Linux.
  • High proficiency in hypervisors (VMware) and enterprise storage architecture (SAN, NAS, S3).
  • Proficiency in Python and PowerShell for automation.
  • Hands-on experience with Grafana and Elasticsearch (ELK) for monitoring.