Deskripsi Pekerjaan
Are you obsessed with uptime, scalability, and system performance? NexusScale is looking for a Senior SRE to join our infrastructure team in San Francisco. You will play a critical role in designing, building, and maintaining our high-traffic cloud infrastructure, ensuring that our platform remains performant and resilient as we scale globally.
You will work at the intersection of software engineering and systems operations, using automation to solve complex problems at scale.
Tanggung Jawab
- Design and implement highly available, scalable, and secure cloud infrastructure.
- Automate manual operational tasks using Python, Go, or Terraform.
- Lead incident response and perform post-mortem analysis to prevent recurrence.
- Optimize CI/CD pipelines to improve developer velocity and deployment stability.
- Collaborate with engineering teams to integrate reliability best practices into the development lifecycle.
- Manage capacity planning and resource forecasting to ensure cost-efficiency.
- Mentor junior engineers on infrastructure-as-code and observability standards.
Kualifikasi
- 5+ years of experience in SRE, DevOps, or Systems Engineering roles.
- Deep expertise in AWS or GCP cloud environments and managed services.
- Strong proficiency in infrastructure-as-code tools such as Terraform or Pulumi.
- Advanced knowledge of Kubernetes orchestration and container security.
- Strong programming skills in Python, Go, or Bash.
- Experience with observability stacks like Prometheus, Grafana, and Datadog.
- Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.