Deskripsi Pekerjaan

Are you obsessed with system performance, scalability, and uptime? NexusCloud Systems is seeking a highly skilled Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will be the bridge between development and operations, ensuring our global cloud infrastructure remains resilient, performant, and secure. You'll work in a fast-paced environment where your work directly impacts the experience of millions of users.

Tanggung Jawab

Architect and maintain highly scalable, distributed cloud infrastructure on AWS/GCP.
Develop and implement automation scripts using Python, Go, or Bash to reduce manual operational toil.
Proactively monitor system performance and troubleshoot complex issues in production environments.
Lead incident response efforts and conduct blameless post-mortems to improve future system reliability.
Optimize CI/CD pipelines to ensure seamless and reliable software deployments.
Collaborate with software engineering teams to design resilient application architectures.
Define and maintain SLOs, SLIs, and error budgets for mission-critical services.

Kualifikasi

5+ years of experience in Site Reliability Engineering or DevOps roles.
Deep expertise in Kubernetes, Docker, and container orchestration platforms.
Proficiency in infrastructure-as-code tools such as Terraform or Pulumi.
Strong experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog).
Solid understanding of cloud networking, security protocols, and Linux system internals.
Demonstrated ability to script and automate complex tasks in Python or Go.
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer