Deskripsi Pekerjaan

Are you obsessed with system reliability, performance, and scalability? NexusCloud Systems is seeking a high-impact Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will be the bridge between development and operations, ensuring our high-traffic global platforms remain resilient, performant, and secure.
You will have the autonomy to architect complex solutions, lead incident response, and drive a culture of 'automation-first' engineering.

Tanggung Jawab

Design, build, and maintain highly scalable, distributed systems on AWS/GCP.
Drive capacity planning and performance tuning to optimize infrastructure costs.
Implement and manage CI/CD pipelines to streamline deployment velocity.
Lead post-mortem analysis and incident response for critical service outages.
Develop automated tooling and monitoring solutions to improve system observability.
Collaborate with cross-functional teams to define and enforce SLOs/SLIs.
Mentor junior engineers and advocate for SRE best practices throughout the SDLC.

Kualifikasi

5+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering.
Expert-level proficiency in Go, Python, or Java.
Deep expertise with Kubernetes orchestration and containerization (Docker).
Proven experience managing Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
Deep understanding of Linux internals, networking, and distributed system design.
Experience with monitoring tools such as Prometheus, Grafana, or Datadog.
Strong problem-solving skills and the ability to thrive in a fast-paced environment.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer