Requirements
• BS/MS degree in Computer Science or a relevant experience subject,
• 5+ years software engineering experience,
• Experience developing or managing deployments of distributed database systems or scale out applications for a SaaS environment,
• Proficiency in Python, Golang, and/or other languages. Expected to be comfortable in Bash and/or other scripting languages
What the job involves
• We’re looking for Site Reliability Engineers to join our growing Arista’s CloudVision-as-a-Service (CVaaS) global SRE team,
• SREs at Arista combine strong software engineering background, systems architecture knowledge, with passion for operating production systems at scale,
• We are responsible for our global CloudVision service fleet, ensuring scalability, reliability, and stability,
• You’ll have firsthand experience in being part of a rapidly growing product with a passionate group of engineers that unapologetically put product reliability and customer experience first,
• We deeply believe in building highly automated and self-sustaining environments, prioritizing safe and efficient operations that leverage cutting edge technologies and tools,
• Arista’s CloudVision is an enterprise network management and streaming telemetry SaaS offering,
• CloudVision stack is built entirely Kubernetes-native,
• Familiarity with GCP (Google Cloud Platform) and GKE (Google Kubernetes Engine) is preferred,
• Our technical stack includes but not limited to: Golang, Python, Ansible/Pulumi, Bash,
• You will be expected to develop, operate, and work with many different types of databases, both directly on Kubernetes or leveraging managed DB products,
• We integrate with many different Open Source Software (OSS) projects that both power our microservices stack, monitoring infrastructure, and much more,
• As an SRE you’ll have the chance to be drive, develop, and lead projects in any of the following areas:,
• Data Platform (NetDL) Architecture and Performance,
• Capacity Planning,
• Autoscaling,
• Disaster Recovery,
• Observability,
• Change Management - CI/CD,
• Service Network Architecture,
• Cost Optimizations,
• Instructure and Cloud-First Application Security,
• You will also be joining globally distributed, “follow the sun model” on-call team where you’ll:,
• Continuously improve operational processes by adding automation,
• Leading sustainable incident response and blameless postmortems