Deskripsi Pekerjaan

Are you obsessed with uptime, scalability, and system performance? NexusCloud Solutions is seeking a Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, ensuring our high-traffic global platforms remain resilient, performant, and secure. You will define the future of our cloud-native architecture.

Tanggung Jawab

Design and maintain highly available, distributed cloud infrastructure on AWS/GCP.
Automate operational tasks using Infrastructure as Code (Terraform, Ansible).
Lead incident response, root cause analysis, and post-mortem investigations.
Optimize CI/CD pipelines to streamline deployment velocity and reliability.
Implement advanced observability, monitoring, and alerting strategies using Prometheus and Grafana.
Collaborate with engineering teams to improve system architecture and fault tolerance.
Manage capacity planning and resource optimization to control cloud costs.

Kualifikasi

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Proficiency in Go, Python, or Ruby for automation and tool development.
Deep expertise in container orchestration with Kubernetes and Docker.
Strong background in Linux system internals and networking protocols.
Proven experience managing large-scale, mission-critical production environments.
Excellent analytical, problem-solving, and communication skills.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer