Deskripsi Pekerjaan

Are you obsessed with system uptime, performance at scale, and the relentless pursuit of automation? CloudScale Dynamics is looking for a seasoned Senior Site Reliability Engineer to join our core infrastructure team in San Francisco. You will be instrumental in building the next generation of our distributed systems, ensuring our global platform remains resilient, performant, and secure under extreme load.
We value engineers who treat operations as a software engineering problem. If you enjoy building tools that empower developers to ship faster while maintaining ironclad reliability, we want to hear from you.

Tanggung Jawab

Design and manage highly available, fault-tolerant infrastructure on AWS/GCP.
Develop and maintain CI/CD pipelines to streamline deployment velocity.
Implement observability solutions using Prometheus, Grafana, and ELK stack.
Lead incident response efforts and conduct blameless post-mortems.
Automate manual operational tasks using Python, Go, or Terraform.
Collaborate with cross-functional engineering teams to optimize cloud spend and resource utilization.
Maintain system security through automated patching and compliance monitoring.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering.
Expertise in cloud infrastructure (AWS/GCP) and container orchestration (Kubernetes).
Strong proficiency in Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
Advanced programming skills in Python, Go, or Bash.
Deep understanding of Linux internals, networking protocols, and distributed systems.
Experience with monitoring/alerting ecosystems and incident management tools (PagerDuty).
Excellent problem-solving skills and a proactive mindset toward system health.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer