• Incident Management & Resolution: Act as the primary point of contact for high-priority production incidents. Drive timely resolution, perform root cause analysis (RCA), and implement preventive measures to minimize future occurrences.
• Application Monitoring & Health: Proactively monitor the health, performance, and capacity of production applications using advanced monitoring tools like Splunk and New Relic. Develop and maintain dashboards, alerts, and runbooks.
• Change Management: Evaluate, approve, and oversee production changes, adhering strictly to Change Management protocols to ensure stability and minimize risk. Participate in release and deployment activities.
• Performance Optimization: Identify performance bottlenecks in application code and infrastructure (Java, Database, Cache) and collaborate with development teams to implement fixes and efficiency improvements.
• System Maintenance: Perform regular system maintenance, health checks, and capacity planning for application infrastructure running on AWS and Pivotal Cloud Foundry (PCF).
• Documentation & Knowledge Sharing: Create and maintain comprehensive support documentation, knowledge base articles, and troubleshooting guides.
• On-Call Support: Participate in an on-call rotation to provide 24/7 support for critical production systems.
• Required Technical Skills &
Experience (7-10 Years)
Apply tot his job
Apply To this Job