hit counter
Beranda Loker Detail
N
Information Technology 🏢 Full Time ⭐️ Terverifikasi

Senior Site Reliability Engineer (SRE)

Nexus Cloud Infrastructure
San Francisco
Estimasi Gaji
USD 175.000 – USD 225.000
Live Update
2 Juni 2026
Batas Akhir
2 Jun 2027

Deskripsi Pekerjaan

Are you obsessed with uptime, scalability, and system performance? Nexus Cloud Infrastructure is looking for a Senior Site Reliability Engineer to join our core platform team. We manage high-traffic distributed systems and need an expert to bridge the gap between software development and IT operations. You will play a pivotal role in designing robust infrastructure that powers mission-critical applications for our global clients.

We value engineers who automate everything, treat infrastructure as code, and thrive in complex, high-pressure environments.

Tanggung Jawab

  • Design, implement, and maintain highly available, scalable, and secure cloud infrastructure on AWS/GCP.
  • Drive capacity planning, performance tuning, and cost optimization initiatives across our microservices architecture.
  • Lead incident response and root-cause analysis efforts for production outages, implementing long-term preventative measures.
  • Automate manual operational workflows using Python, Go, or shell scripting.
  • Develop and manage CI/CD pipelines to ensure rapid, reliable code deployments.
  • Establish and enforce SLOs, SLAs, and SLIs to maintain system reliability standards.
  • Mentor junior engineers and promote a culture of operational excellence across the engineering department.

Kualifikasi

  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • 5+ years of experience in SRE, DevOps, or Software Engineering roles.
  • Deep proficiency in cloud platforms (AWS or GCP) and container orchestration tools like Kubernetes.
  • Strong background in Linux system administration and networking (TCP/IP, DNS, Load Balancing).
  • Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible).
  • Expertise in observability and monitoring tools like Prometheus, Grafana, Datadog, or ELK Stack.
  • Proven ability to troubleshoot complex distributed systems in a production environment.
  • Proficiency in at least one high-level programming language such as Go, Python, or Ruby.

Keahlian yang Dibutuhkan

Kubernetes AWS Terraform Go Python Observability CI/CD Distributed Systems SRE

Siap Mengambil Tantangan Ini?

Pastikan resume Anda sudah siap. Kirimkan lamaran Anda sekarang sebelum tanggal deadline.

Lamar Sekarang

Lowongan Terkait

Rekomendasi pekerjaan serupa untuk Anda

Lihat Semua