Description:
• Own CI/CD pipelines and gate releases against AI-specific failure modes such as eval regressions, groundedness drift, and performance regressions.
• Build observability for AI behavior in production beyond standard infrastructure metrics.
• Design test strategies for workflows that interact with physical hardware, using simulator-first and hardware-on-demand approaches.
• Drive AI-assisted engineering practices, including tooling, prompt review, and evaluation of AI code suggestions.
• Support platform quality and release reliability for the reasoning and execution layer deployed in customer environments.
• Collaborate on safe shipping practices for production systems where reliability, observability, and auditability are critical.
• Develop and maintain testing and validation workflows for AI systems and related deployment pipelines.
Requirements:
• 3+ years of combined QA and DevOps experience.
• Direct experience in software QA and modern testing workflows, including automation.
• Proven experience owning a CI/CD pipeline end-to-end in production.
• Hands-on production experience with AWS preferred, or Azure/GCP.
• API testing and debugging experience with Postman, REST-assured, or custom tooling.
• Working exposure to DevOps practices, deployment pipelines, and CI/CD systems.
• Daily comfort using AI-assisted engineering tools such as Cursor, GitHub Copilot, Claude Code, or similar.
• Familiarity with at least one observability stack such as Elastic, Datadog, Splunk, or Grafana.
• Docker and containerized environment familiarity.
• Experience designing test or evaluation workflows for AI systems, such as LLM output validation, RAG pipeline testing, or prompt-based test orchestration.
• Strong troubleshooting and analytical skills.
• Experience with GitHub Actions, Azure DevOps, or comparable CI/CD tooling.
• Startup or SaaS environment experience where QA also supported operations.
Benefits:
• Flexible / unlimited time off.
• Health insurance.
• Equity participation, discussed at offer.
• Fully remote work.
• Architectural ownership of work that ships to real enterprise customers.
• Direct working relationships with the people setting platform strategy.
• A growth-stage platform where your first-year decisions can shape the product for years.
• AI-assisted tooling licensed by NetSpeek, including Cursor, Claude Code, GitHub Copilot, or comparable tools.