Key Responsibilities
• Design, develop, and maintain scalable ETL pipelines and data processing applications
• Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
• Analyze business and technical requirements to produce detailed implementation designs
• Perform unit testing, integration testing, and debugging of applications
• Troubleshoot and resolve performance issues related to high-volume data processing
• Develop and maintain SQL queries, stored procedures, and database objects
• Work with structured and unstructured datasets for healthcare analytics
• Generate statistical reports and support data validation processes
• Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
• Follow software engineering best practices and maintain code quality standards
Required Skills & Experience
• Strong experience in ETL development, data processing, and database technologies
• 5+ years of experience with Microsoft SQL Server and relational databases
• Expertise in SQL performance tuning, indexing strategies, and query optimization
• 2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
• Hands-on experience with PySpark, Python, and/or Java
• Experience working with large-scale data processing frameworks
• Strong understanding of data transformation and data movement technologies
• Ability to handle high-volume structured and unstructured datasets
• Good understanding of end-to-end application/data pipeline lifecycle
Apply tot his job
Apply To this Job