Note: The job is a remote job and is open to candidates in USA. Effectual is seeking a Senior Data Engineer with specialized expertise in data streaming technologies to join their data team. This role focuses on building and maintaining high-performance data streaming architectures that enable real-time data processing and analytics.
Responsibilities
- Design, build, and maintain scalable streaming data architectures using Kafka, MSK, and Kinesis
- Develop real-time data pipelines that handle high-volume, high-velocity data streams
- Implement event-driven architectures and microservices patterns for streaming data processing
- Create and optimize data streaming topologies for complex event processing scenarios
- Design fault-tolerant streaming systems with proper error handling and data recovery mechanisms
- Configure, deploy, and manage Apache Kafka clusters and AWS MSK environments
- Implement Kafka Connect pipelines for streaming data integration
- Design optimal Kafka topic partitioning strategies and replication configurations
- Monitor and optimize Kafka cluster performance, throughput, and latency
- Implement Kafka security configurations including SSL/TLS, SASL, and ACLs
- Manage Kafka Schema Registry for data serialization and evolution
- Design and implement Amazon Kinesis Data Streams and Kinesis Data Firehose solutions
- Configure Kinesis Analytics applications for real-time stream processing
- Optimize Kinesis shard management and auto-scaling configurations
- Implement Kinesis data retention and archival strategies
- Integrate Kinesis with other AWS services for comprehensive streaming solutions
- Develop real-time stream processing applications using Apache Spark Streaming, Kafka Streams, or AWS Lambda
- Implement complex event processing (CEP) patterns for real-time analytics
- Build streaming ETL pipelines that transform data in motion
- Create real-time aggregations, windowing operations, and stateful stream processing
- Optimize streaming query performance and resource utilization
- Ensure seamless integration between streaming systems and data lakes, data warehouses, and operational databases
- Implement data lineage and monitoring for streaming data pipelines
- Create automated data quality checks and validation for streaming data
- Manage data serialization formats (Avro, JSON, Protobuf) and schema evolution
- Coordinate with data scientists and analysts to ensure streaming data meets analytical requirements
- Implement Infrastructure as Code (IaC) for streaming data platforms using Terraform or CloudFormation
- Automate deployment and management of streaming infrastructure through CI/CD pipelines
- Monitor streaming system health, performance metrics, and alerting
- Implement disaster recovery and high availability strategies for streaming systems
- Stay current with emerging trends in streaming technologies and cloud-native solutions
- Collaborate with data architects, data scientists, and application teams on streaming data requirements
- Support rigorous project governance through daily progress reviews and time tracking
- Provide technical leadership and mentorship to junior data engineers
- Communicate complex streaming concepts to technical and non-technical stakeholders
- Operate with transparency and responsiveness to support high-performing teams
Skills
- 7+ years of experience in the data engineering field with significant streaming data specialization
- Bachelor's degree in Computer Science, Engineering, or related STEM field
- Extensive hands-on experience with Apache Kafka including cluster management, performance tuning, and ecosystem tools
- Proven experience with AWS MSK and Amazon Kinesis services in production environments
- Strong background in real-time data processing and stream analytics
- Streaming Technologies: Apache Kafka, Kafka Connect, Kafka Streams, Amazon MSK, Amazon Kinesis (Data Streams, Data Firehose, Analytics)
- Programming Languages: Proficient in Python, Java, and Scala for streaming applications
- Stream Processing Frameworks: Apache Spark Streaming, Apache Flink, AWS Lambda for stream processing
- Data Serialization: Experience with Avro, Protocol Buffers, JSON, and schema registry management
- Big Data Technologies: Hadoop ecosystem, Apache Spark, distributed computing concepts
- Database Technologies: SQL and NoSQL databases, data warehousing solutions, time-series databases
- AWS Services: Deep knowledge of AWS streaming and analytics services (MSK, Kinesis, Lambda, EMR, Glue)
- Containerization: Docker and Kubernetes for streaming application deployment
- Infrastructure as Code: Terraform, CloudFormation for streaming infrastructure automation
- Monitoring: CloudWatch, Prometheus, Grafana for streaming system observability
- Security: Implementation of streaming data security, encryption, and access controls
- Expert use of code versioning tools such as GitHub
- Expert knowledge of Agile methodologies and delivery practices
- Experience with CI/CD pipelines for streaming data applications
- Understanding of data APIs, REST services, and microservices architectures
- Leadership & Team Management
- Risk Management and mitigation strategies for streaming systems
- Conflict Resolution
- Strategic Planning & Leadership for data streaming initiatives
- Resource Management and capacity planning
- Change Management for streaming technology adoption
- Core AWS Certifications: AWS Data Engineer Associate (required)
- AWS Solutions Architect Professional (preferred)
- AWS Developer Professional (recommended)
- Confluent Certified Administrator for Apache Kafka (highly recommended)
- Confluent Certified Developer for Apache Kafka (preferred)
- AWS Big Data Specialty (if available in current form)
- AWS Security Specialist
- Certified Associate Data Analyst with Python
- Certified Professional Python Programmer Level 1
- Databricks Data Engineer Professional
- Certified Associate Python Programmer
- Java or Scala certification (Oracle Certified Professional)
- Experience with Apache Flink for advanced stream processing
- Knowledge of Apache Pulsar as an alternative messaging system
- Experience with event sourcing and CQRS patterns
- Understanding of Apache Airflow for batch and streaming workflow orchestration
- Experience with ksqlDB for stream processing using SQL
- Background in financial services, IoT, or other real-time data intensive industries
- Experience with multi-cloud streaming architectures
- Knowledge of Apache NiFi for data flow automation
Benefits
- Medical, dental, and vision health insurances
- Short term disability, long term disability and life insurances
- 401k with Company match
- Paid time off (PTO) (120 hours PTO that accrue over one year)
- Paid time off for major holidays (14 days per year)
- These and any other employee benefit offerings are subject to management’s discretion and may change at any time.
Company Overview
Cloud Service Provider, AWS Premier Tier Services Partner, Generative and Agentic AI, Migration, Modernization It was founded in 2019, and is headquartered in Jersey City, New Jersey, USA, with a workforce of 201-500 employees. Its website is https://www.effectual.ai.Company H1B Sponsorship
Effectual has a track record of offering H1B sponsorships, with 3 in 2023, 3 in 2022, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role.