Position Summary
We are seeking a highly experienced Principal Observability Architect to lead the design, implementation, modernization, and optimization of enterprise-scale observability and analytics platforms. This role will serve as the technical authority for log management, observability engineering, telemetry pipelines, AIOps, security analytics, and data lakehouse architectures leveraging Splunk, Databricks, Cribl, OpenTelemetry, and cloud-native technologies.
The ideal candidate possesses deep expertise in traditional observability platforms (Splunk, Dynatrace, AppDynamics, ServiceNow ITOM) and modern data lakehouse architectures utilizing Databricks, Delta Lake, Unity Catalog, and AI/ML-driven analytics. This individual will drive the strategic transformation from legacy SIEM and observability platforms toward scalable, cloud-native observability data lakes.
Key Responsibilities
Enterprise Architecture & Strategy
Define enterprise observability architecture standards, patterns, and roadmaps.
Lead observability transformation initiatives involving Splunk modernization and Databricks adoption.
Develop reference architectures for telemetry ingestion, storage, analytics, security, and AI-driven operations.
Align observability strategies with business, security, compliance, and operational objectives.
Create executive-level architecture presentations, business cases, and technology roadmaps.
Splunk Platform Leadership
Architect large-scale Splunk Enterprise and Splunk Cloud environments.
Design and optimize:
Indexer clusters
Search head clusters
Forwarder architectures
Deployment servers
Data models
ITSI implementations
Define ingestion, retention, indexing, and data lifecycle strategies.
Lead migration initiatives involving:
Splunk to Databricks
Heavy Forwarders to Cribl
SIEM modernization programs
Optimize SPL searches, data models, summary indexing, and dashboard performance.
Databricks & Lakehouse Architecture
Architect enterprise observability data lake solutions using:
Databricks Lakehouse
Delta Lake
Unity Catalog
Delta Live Tables
Structured Streaming
Mosaic AI
Genie
Design Medallion Architectures:
Bronze
Silver
Gold
Develop governance strategies including:
RBAC
Data masking
Data lineage
Audit controls
Create high-performance log analytics solutions capable of supporting petabyte-scale telemetry environments.
Enable self-service analytics and AI-powered observability use cases.
Telemetry & Data Engineering
Design ingestion architectures supporting:
OpenTelemetry
OCSF
Syslog
Kafka
Azure Event Hubs
AWS Kinesis
GCP Pub/Sub
Cribl
Define normalization and enrichment frameworks.
Establish data quality and schema management processes.
Design real-time and batch processing pipelines.
AIOps & Advanced Analytics
Lead implementation of:
AIOps
Predictive analytics
Root cause analysis
Anomaly detection
Event correlation
Integrate observability datasets with AI/ML platforms.
Develop observability use cases leveraging:
Mosaic AI
Agentic AI
LLMs
Generative AI
Build operational intelligence and executive KPI dashboards.
Security & Compliance
Architect observability solutions supporting:
SOC operations
Threat hunting
Security analytics
Compliance reporting
Design frameworks aligned with:
HIPAA
PCI-DSS
SOX
NIST
ISO 27001
Implement data governance and security controls across observability platforms.
Leadership & Governance
Provide technical leadership to engineering teams.
Mentor architects, engineers, and developers.
Conduct architecture reviews and design governance.
Define platform standards, best practices, and operational procedures.
Engage directly with executive stakeholders and business leaders.
Required Qualifications
Experience
10+ years of experience in Enterprise Observability, Monitoring, or Security Analytics.
5+ years architecting large-scale Splunk environments.
3+ years designing Databricks Lakehouse architectures.
Experience managing environments exceeding:
50 TB/day preferred
100+ TB/day strongly preferred
Experience leading enterprise transformation programs.
Splunk Expertise
Deep expertise in:
Splunk Enterprise
Splunk Cloud
Splunk ITSI
Enterprise Security
SPL Development
Data Models
Indexer Clustering
Search Head Clustering
SmartStore
Heavy Forwarders
Universal Forwarders
Databricks Expertise
Strong experience with:
Databricks Lakehouse
Delta Lake
Unity Catalog
Delta Live Tables
Structured Streaming
Databricks SQL
Genie
Mosaic AI
Lakehouse Federation
Cloud Platforms
Experience with one or more:
Microsoft Azure
Amazon Web Services
Google Cloud
Data Technologies
Strong knowledge of:
Kafka
OpenTelemetry
OCSF
Iceberg
Spark
SQL
Python
REST APIs
Event Streaming Architectures
Preferred Qualifications
Experience with Cribl Stream and Cribl Edge
Experience with Dynatrace, AppDynamics, Datadog, or New Relic
Experience with ServiceNow ITOM/Event Management
Experience designing AI/ML operational analytics solutions
Experience with Security Data Lakes and SIEM modernization initiatives
Experience with FinOps and cloud cost optimization
Experience building observability platforms for healthcare, financial services, retail, or large enterprise organizations
Certifications (Preferred)
Splunk
Splunk Enterprise Certified Architect
Splunk Core Certified Consultant
Databricks
Databricks Certified Data Engineer Professional
Databricks Certified Solutions Architect
Cloud
Azure Solutions Architect Expert
AWS Solutions Architect Professional
Google Professional Cloud Architect
Success Metrics
Within the first 12 months, the architect will:
Deliver enterprise observability architecture roadmap.
Reduce observability platform costs through modernization initiatives.
Design and implement a scalable observability data lake architecture.
Improve telemetry ingestion performance and reliability.
Enable AI-powered analytics and operational intelligence capabilities.
Establish enterprise governance standards for observability and security telemetry.
Support petabyte-scale observability and security analytics workloads.
Ideal Background
Candidates from organizations utilizing large-scale observability environments such as healthcare, banking, retail, telecommunications, logistics, cloud providers, or managed services organizations are highly desirable. Experience supporting environments generating 100TB+ of telemetry per day and integrating Splunk, Databricks, Cribl, OpenTelemetry, and cloud-native data platforms is strongly preferred.