Deskripsi Pekerjaan
At Nexus Cloud Infrastructure, we are building the backbone of the next generation of distributed systems. We are looking for a visionary Senior Site Reliability Engineer to join our elite team in San Francisco. You will be responsible for scaling our cloud-native platforms, optimizing global latency, and architecting fault-tolerant services that power millions of requests per second.
If you are passionate about automation, observability, and infrastructure-as-code, we want to talk to you.
Tanggung Jawab
- Architect and maintain highly scalable distributed systems using Kubernetes and AWS.
- Drive capacity planning, performance tuning, and global infrastructure scaling efforts.
- Implement proactive monitoring and observability strategies using Prometheus, Grafana, and ELK.
- Automate manual operational workflows to reduce toil and improve system efficiency.
- Lead incident response rotations and conduct thorough blameless post-mortems.
- Collaborate with engineering squads to ensure high availability for mission-critical applications.
- Mentor junior SREs and foster a culture of rigorous engineering standards.
Kualifikasi
- 5+ years of experience in Site Reliability Engineering or heavy DevOps roles.
- Expertise in cloud infrastructure (AWS/GCP/Azure) and Kubernetes orchestration.
- Proficiency in programming with Go, Python, or Rust.
- Deep understanding of IaC tools such as Terraform, Pulumi, or Crossplane.
- Proven experience with CI/CD pipeline optimization (Jenkins, GitHub Actions, ArgoCD).
- Strong knowledge of networking protocols (TCP/IP, DNS, Load Balancing) and security best practices.
- Ability to work effectively in a collaborative, fast-paced remote or hybrid environment.