Deskripsi Pekerjaan
Are you obsessed with uptime, performance, and automation? NexusScale is looking for a Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will be the architect of our reliability strategy, ensuring our global cloud footprint remains resilient as we scale to millions of users.
You will work alongside elite software engineers to bridge the gap between development and operations, implementing cutting-edge IaC practices and observability frameworks in a high-traffic environment.
Tanggung Jawab
- Design and maintain highly available, scalable, and secure cloud infrastructure.
- Automate manual operational tasks using Python, Go, or shell scripting.
- Lead incident response and perform blameless post-mortems to improve system architecture.
- Manage CI/CD pipelines to ensure rapid, safe deployment of microservices.
- Optimize cloud resource utilization to balance performance with cost-efficiency.
- Collaborate with product teams to define and monitor SLOs/SLIs.
Kualifikasi
- 5+ years of experience in SRE, DevOps, or Systems Engineering.
- Deep expertise in AWS or GCP cloud architecture and service components.
- Strong proficiency in Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
- Expert-level knowledge of Kubernetes orchestration and container security.
- Strong background in Linux internals, networking, and distributed systems.
- Excellent communication skills with the ability to lead cross-functional technical initiatives.