Deskripsi Pekerjaan
Are you obsessed with uptime, scalability, and system performance? NexusScale is looking for a Senior Site Reliability Engineer to help us build and maintain high-performance infrastructure that powers the next generation of cloud applications. In this role, you will bridge the gap between development and operations, ensuring our systems are robust, automated, and lightning-fast.
You will work with a world-class team of engineers, influencing architecture and driving the adoption of best-in-class SRE practices across our global footprint.
Tanggung Jawab
- Design, build, and maintain highly available and scalable distributed systems on GCP/AWS.
- Implement automation for infrastructure provisioning and configuration management using Terraform and Ansible.
- Champion observability by developing robust monitoring, alerting, and logging solutions.
- Lead incident response efforts and conduct blameless post-mortems to improve system reliability.
- Optimize cloud costs through resource utilization analysis and auto-scaling strategies.
- Collaborate with engineering teams to improve software delivery pipelines and CI/CD workflows.
- Develop internal tools to improve developer productivity and self-service capabilities.
Kualifikasi
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- 5+ years of experience in SRE, DevOps, or systems engineering roles.
- Deep expertise in Linux systems administration and container orchestration (Kubernetes).
- Strong proficiency in at least one programming language such as Go, Python, or Java.
- Proven track record managing large-scale production environments with complex microservices architectures.
- Solid understanding of networking concepts (TCP/IP, DNS, Load Balancing) and security best practices.
- Experience with IaC tools such as Terraform or CloudFormation.