Deskripsi Pekerjaan

At Nexus Cloud Infrastructure, we are building the backbone of the next generation of distributed systems. We are looking for a visionary Senior Site Reliability Engineer to join our elite team in San Francisco. You will be responsible for scaling our cloud-native platforms, optimizing global latency, and architecting fault-tolerant services that power millions of requests per second.
If you are passionate about automation, observability, and infrastructure-as-code, we want to talk to you.

Tanggung Jawab

Architect and maintain highly scalable distributed systems using Kubernetes and AWS.
Drive capacity planning, performance tuning, and global infrastructure scaling efforts.
Implement proactive monitoring and observability strategies using Prometheus, Grafana, and ELK.
Automate manual operational workflows to reduce toil and improve system efficiency.
Lead incident response rotations and conduct thorough blameless post-mortems.
Collaborate with engineering squads to ensure high availability for mission-critical applications.
Mentor junior SREs and foster a culture of rigorous engineering standards.

Kualifikasi

5+ years of experience in Site Reliability Engineering or heavy DevOps roles.
Expertise in cloud infrastructure (AWS/GCP/Azure) and Kubernetes orchestration.
Proficiency in programming with Go, Python, or Rust.
Deep understanding of IaC tools such as Terraform, Pulumi, or Crossplane.
Proven experience with CI/CD pipeline optimization (Jenkins, GitHub Actions, ArgoCD).
Strong knowledge of networking protocols (TCP/IP, DNS, Load Balancing) and security best practices.
Ability to work effectively in a collaborative, fast-paced remote or hybrid environment.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer