Deskripsi Pekerjaan

Are you obsessed with system uptime and scalability? Join CloudScale Innovations, where we are building the next generation of high-availability cloud infrastructure. We are seeking a Senior Site Reliability Engineer to help us bridge the gap between development and operations, ensuring our global services remain resilient, performant, and secure.
You will play a pivotal role in automating infrastructure, optimizing CI/CD pipelines, and driving a culture of blameless post-mortems.

Tanggung Jawab

Design, implement, and maintain highly available and scalable cloud infrastructure.
Automate manual operational tasks using Infrastructure as Code (Terraform, Ansible).
Lead incident response efforts and conduct blameless post-mortems to improve system reliability.
Optimize cloud costs and performance through rigorous monitoring and capacity planning.
Collaborate with cross-functional engineering teams to integrate reliability best practices into the SDLC.
Participate in an on-call rotation to ensure 99.99% service availability.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Deep expertise in AWS or GCP cloud services and Kubernetes orchestration.
Proficiency in programming with Python, Go, or Ruby for infrastructure automation.
Strong grasp of Linux internals, networking, and distributed systems architecture.
Experience with observability stacks like Prometheus, Grafana, and Datadog.
Excellent communication skills and the ability to thrive in a fast-paced, collaborative environment.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer