Deskripsi Pekerjaan
Are you obsessed with uptime and performance? NexusScale is looking for a elite Senior Site Reliability Engineer to join our core infrastructure team in San Francisco. You will play a pivotal role in designing, automating, and maintaining our high-scale cloud-native environments that support millions of global users.
We are a remote-friendly, culture-first company that values engineering excellence, radical transparency, and continuous improvement. If you thrive in high-stakes environments and love building robust systems from the ground up, we want to hear from you.
Tanggung Jawab
- Architect and maintain highly available, scalable cloud infrastructure on AWS/GCP.
- Drive capacity planning, performance tuning, and infrastructure cost optimization.
- Lead incident response and perform blameless post-mortems to improve system resilience.
- Implement Infrastructure as Code (IaC) best practices using Terraform and Kubernetes.
- Automate manual operational tasks to reduce technical debt and improve deployment velocity.
- Mentor junior engineers and promote a culture of operational excellence across the organization.
- Manage CI/CD pipelines to ensure seamless, zero-downtime application deployments.
Kualifikasi
- 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Architecture.
- Deep proficiency with Linux systems, networking protocols, and distributed systems.
- Expert-level skills in Terraform, Kubernetes, and container orchestration.
- Strong programming skills in Python, Go, or Ruby for automation and tool development.
- Experience with observability stacks (Prometheus, Grafana, Datadog, or ELK).
- Proven ability to troubleshoot complex issues in high-traffic production environments.
- Bachelor's degree in Computer Science, Engineering, or relevant practical experience.