← Back to Jobs
Posted May 31, 2026

Senior Site Reliability Engineer (CloudVision as a Service)

Requirements • BS/MS degree in Computer Science or a relevant experience subject, • 5+ years software engineering experience, • Experience developing or managing deployments of distributed database systems or scale out applications for a SaaS environment, • Proficiency in Python, Golang, and/or other languages. Expected to be comfortable in Bash and/or other scripting languages What the job involves • We’re looking for Site Reliability Engineers to join our growing Arista’s CloudVision-as-a-Service (CVaaS) global SRE team, • SREs at Arista combine strong software engineering background, systems architecture knowledge, with passion for operating production systems at scale, • We are responsible for our global CloudVision service fleet, ensuring scalability, reliability, and stability, • You’ll have firsthand experience in being part of a rapidly growing product with a passionate group of engineers that unapologetically put product reliability and customer experience first, • We deeply believe in building highly automated and self-sustaining environments, prioritizing safe and efficient operations that leverage cutting edge technologies and tools, • Arista’s CloudVision is an enterprise network management and streaming telemetry SaaS offering, • CloudVision stack is built entirely Kubernetes-native, • Familiarity with GCP (Google Cloud Platform) and GKE (Google Kubernetes Engine) is preferred, • Our technical stack includes but not limited to: Golang, Python, Ansible/Pulumi, Bash, • You will be expected to develop, operate, and work with many different types of databases, both directly on Kubernetes or leveraging managed DB products, • We integrate with many different Open Source Software (OSS) projects that both power our microservices stack, monitoring infrastructure, and much more, • As an SRE you’ll have the chance to be drive, develop, and lead projects in any of the following areas:, • Data Platform (NetDL) Architecture and Performance, • Capacity Planning, • Autoscaling, • Disaster Recovery, • Observability, • Change Management - CI/CD, • Service Network Architecture, • Cost Optimizations, • Instructure and Cloud-First Application Security, • You will also be joining globally distributed, “follow the sun model” on-call team where you’ll:, • Continuously improve operational processes by adding automation, • Leading sustainable incident response and blameless postmortems