Deskripsi Pekerjaan

Are you obsessed with system reliability, performance, and automation? NexusCloud Systems is looking for a Senior Site Reliability Engineer to help us build and maintain our next-generation cloud infrastructure. You will be the bridge between development and operations, ensuring our high-scale services remain resilient, secure, and performant.
We operate at a massive scale and believe in 'everything as code.' If you are passionate about reducing toil and optimizing system performance, we want to talk to you.

Tanggung Jawab

Design, build, and maintain scalable infrastructure on AWS and Kubernetes.
Automate manual operational processes to eliminate toil and improve system efficiency.
Lead incident response, root cause analysis, and post-mortem investigations for production systems.
Define and implement Service Level Objectives (SLOs) and Error Budgets.
Collaborate with software engineering teams to improve application performance and reliability.
Manage CI/CD pipelines to ensure seamless, secure, and rapid deployment of services.
Mentor junior engineers and promote a culture of operational excellence.

Kualifikasi

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or Software Engineering roles.
Proficiency in programming languages such as Go, Python, or Java.
Expertise in container orchestration tools, specifically Kubernetes and Helm.
Deep understanding of cloud infrastructure (AWS/GCP) and Infrastructure as Code (Terraform).
Solid experience with monitoring and logging stacks like Prometheus, Grafana, and ELK.
Strong problem-solving skills and the ability to work under pressure during high-impact outages.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer