Deskripsi Pekerjaan

Are you obsessed with uptime, scalability, and distributed systems? NexusCloud Systems is looking for a elite Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will be the bridge between development and operations, ensuring our global high-traffic platforms remain performant and resilient under heavy load. You will design, build, and maintain the systems that power our cloud-native environment, driving automation and reliability across the board.

Tanggung Jawab

Architect and maintain highly available, scalable cloud infrastructure on AWS.
Automate manual operational tasks using Infrastructure as Code (Terraform, Pulumi).
Lead incident response and perform deep-dive blameless post-mortems for production issues.
Optimize system performance, latency, and resource utilization through rigorous monitoring.
Develop and manage CI/CD pipelines to facilitate rapid, safe deployment cycles.
Collaborate with product engineering teams to influence design choices for reliability.
Implement robust security measures and compliance protocols across our infrastructure.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering at a high-growth tech company.
Expertise in cloud infrastructure (AWS preferred) and container orchestration (Kubernetes).
Strong proficiency in Go, Python, or Ruby for automation and tool development.
Deep understanding of observability tools like Prometheus, Grafana, Datadog, or New Relic.
Solid grasp of networking fundamentals (DNS, Load Balancing, TLS, HTTP/S).
Proven experience with configuration management (Ansible, Chef) and CI/CD tools.
Ability to participate in an on-call rotation to maintain 99.99% system availability.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer