Job Description:
• Lead and execute the vision, strategy, and roadmap for Reddit’s large-scale GenAI Platform.
• Define the platform architecture and operating model that enable teams to build, deploy, and scale GenAI products reliably.
• Drive the strategy for a unified LAG Gateway supporting internally and externally hosted LLMs through consistent APIs and abstractions.
• Set the direction for core platform capabilities such as rate and token limit management, intelligent failover, and production resilience.
• Shape Reddit’s approach to an enterprise-grade RAG system
• Establish the strategic direction for agentic AI workflows and tool-use patterns across the platform.
• Own the end-to-end platform strategy from concept through production adoption and long-term evolution.
• Drive MLOps and LLMOps standards across CI/CD, testing, versioning, evaluation, and lifecycle management.
• Define best practices for observability, monitoring, governance, and operational excellence across GenAI systems.
• Partner across engineering, product, and leadership to align platform investments with company priorities and user needs.
• Champion platform thinking with a strong focus on scalability, reliability, performance, and developer experience.
• Influence technical direction across teams by turning emerging AI capabilities into a scalable platform strategy.
Requirements:
• 10+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
• Have a track record of leading technical strategy and delivering AI platforms in cloud-based production environments at scale.
• Demonstrate strong execution by turning strategy into action, driving complex initiatives end to end, and consistently delivering high-quality platform outcomes.
• Bring deep experience operating Kubernetes and other orchestration systems in large-scale production environments.
• Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
• Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
• Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
• Strong focus on scalability, reliability, performance, and developer experience. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
• Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus
Benefits:
• Comprehensive Healthcare Benefits and Income Replacement Programs
• 401k with Employer Match
• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
• Family Planning Support
• Gender-Affirming Care
• Mental Health & Coaching Benefits
• Flexible Vacation & Paid Volunteer Time Off
• Generous Paid Parental Leave