hit counter
Beranda Loker Detail
N
Information Technology 🏢 Full Time ⭐️ Terverifikasi

Senior Site Reliability Engineer (SRE)

NexusScale Cloud Infrastructure
San Francisco
Estimasi Gaji
USD 170.000 – USD 210.000
Live Update
2 Juni 2026
Batas Akhir
2 Jun 2027

Deskripsi Pekerjaan

Are you obsessed with uptime, performance, and building resilient systems at scale? NexusScale is looking for a Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, ensuring our mission-critical services remain highly available and performant for millions of users worldwide.

You will work in a high-impact, cloud-native environment where you'll have the autonomy to architect solutions that define the future of our platform.

Tanggung Jawab

  • Design, implement, and maintain highly available distributed systems on AWS and GCP.
  • Automate infrastructure provisioning and configuration management using Terraform and Ansible.
  • Champion SRE best practices, including error budgets, incident response, and post-mortems.
  • Scale our container orchestration platform (Kubernetes) to support rapid service growth.
  • Improve system observability through advanced logging, monitoring, and tracing stacks (Datadog, Prometheus, Grafana).
  • Collaborate with cross-functional engineering teams to optimize application performance and latency.
  • Manage capacity planning and perform proactive performance tuning for critical backend services.

Kualifikasi

  • 5+ years of experience in SRE, DevOps, or high-scale systems engineering.
  • Deep proficiency in Kubernetes, including cluster management and troubleshooting.
  • Strong development skills in Go, Python, or Java with a focus on writing maintainable automation scripts.
  • Extensive experience with Infrastructure as Code (IaC) tooling, specifically Terraform.
  • Proven expertise in managing large-scale production environments in cloud providers (AWS/GCP/Azure).
  • Strong understanding of Linux internals, networking, and security best practices.
  • Ability to participate in an on-call rotation and lead complex incident resolution processes.

Keahlian yang Dibutuhkan

Kubernetes Go Python Terraform AWS GCP Prometheus SRE Linux CI/CD

Siap Mengambil Tantangan Ini?

Pastikan resume Anda sudah siap. Kirimkan lamaran Anda sekarang sebelum tanggal deadline.

Lamar Sekarang

Lowongan Terkait

Rekomendasi pekerjaan serupa untuk Anda

Lihat Semua