Deskripsi Pekerjaan
Are you obsessed with system uptime, latency, and scalability? CloudScale Innovations is seeking a Senior SRE to join our infrastructure team. You will be the architect behind our high-traffic global distributed systems, ensuring reliability and performance for millions of users.
We foster a culture of blameless post-mortems, automation-first mindsets, and engineering excellence. If you are a proponent of Infrastructure as Code and enjoy solving complex distributed systems puzzles, we want to hear from you.
Tanggung Jawab
- Design, build, and maintain highly available, scalable, and resilient distributed systems.
- Automate manual operational tasks to minimize toil and improve system efficiency.
- Conduct deep-dive incident responses, blameless post-mortems, and capacity planning.
- Optimize cloud infrastructure costs without compromising performance or reliability.
- Collaborate with product and engineering teams to ensure architectural best practices are integrated early.
- Develop and refine monitoring, alerting, and observability dashboards for production environments.
Kualifikasi
- 5+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering.
- Expert-level proficiency in cloud-native technologies (AWS, GCP, or Azure).
- Strong hands-on experience with Kubernetes, Docker, and service mesh architectures.
- Proficiency in Go, Python, or Java for automation and tooling development.
- Advanced experience with Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
- Deep understanding of observability tools like Prometheus, Grafana, Datadog, or Honeycomb.
- Strong background in CI/CD pipeline optimization and automated testing strategies.