Deskripsi Pekerjaan

Are you obsessed with system uptime, performance at scale, and automating the mundane? CloudScale Systems is looking for a Senior Site Reliability Engineer to join our core infrastructure team in San Francisco. You will play a pivotal role in designing, building, and maintaining our global cloud-native architecture. We value engineers who view operations as a software problem and thrive in high-stakes environments.

Tanggung Jawab

Architect and maintain highly available, scalable cloud infrastructure on AWS/Kubernetes.
Implement Infrastructure as Code (IaC) using Terraform to ensure environment consistency.
Develop and automate CI/CD pipelines to streamline deployment velocity.
Lead incident response and perform deep-dive post-mortem analyses to prevent recurrence.
Optimize system performance and resource utilization to manage cloud infrastructure costs.
Define and implement Service Level Objectives (SLOs) and Error Budgets.
Mentor junior engineers on best practices for observability and system reliability.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Expertise in container orchestration with Kubernetes and Docker in production environments.
Strong proficiency in scripting/coding (Python, Go, or Bash).
Deep understanding of cloud networking, security, and storage on AWS.
Demonstrated experience with monitoring and observability stacks (Prometheus, Grafana, ELK, or Datadog).
Strong analytical skills with a proactive approach to troubleshooting complex distributed systems.
Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer