Deskripsi Pekerjaan

Are you obsessed with system uptime, performance at scale, and automating the mundane? NexusCloud Systems is seeking a visionary Senior Site Reliability Engineer to join our core infrastructure team in San Francisco. You will play a pivotal role in designing, building, and maintaining the highly available cloud-native environments that power our global SaaS platform.
We don't just 'keep the lights on'; we engineer solutions that prevent outages before they happen. Join a culture of SRE excellence where innovation is encouraged and impact is visible.

Tanggung Jawab

Architect, implement, and optimize scalable cloud infrastructure on AWS and Kubernetes.
Automate operational tasks using Infrastructure as Code (Terraform) and CI/CD pipelines.
Drive incident management and post-mortem analysis to enhance system resilience.
Implement proactive monitoring, logging, and alerting strategies to ensure 99.99% uptime.
Lead capacity planning and performance tuning to support rapid traffic growth.
Mentor junior engineers and promote a culture of operational excellence across teams.

Kualifikasi

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or Software Engineering roles.
Expert-level proficiency with AWS, Kubernetes (EKS), and container orchestration.
Strong development skills in Go, Python, or Ruby for automation.
Deep understanding of observability tools like Prometheus, Grafana, and Datadog.
Proven ability to troubleshoot complex, distributed systems in a production environment.
Experience managing large-scale PostgreSQL or NoSQL database clusters.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer