We are looking for a Service Level & Availability Manager to support a global enterprise’s infrastructure operations and drive reliability across complex, hybrid environments. This role blends technical depth with operational rigor and collaboration across business and IT to ensure resilient, always-on service delivery.
This is a USA-based role.
\n
Responsibilities- Own end-to-end Service Level and Availability Management across on-premises, cloud, and third-party systems.
- Develop and maintain availability plans that align with business priorities and risk mitigation strategies.
- Monitor system health and performance using tools such as PowerBI, Datadog, Splunk, PagerDuty, and ServiceNow.
- Partner with Infrastructure, DevOps, and Application teams to embed availability practices into change, incident, and release processes.
- Define and track key metrics (uptime, reliability, MTTR/MTTI) and present trends and recommendations to technical and leadership teams.
- Foster a culture of proactive monitoring, resilience engineering, and continuous improvement.
Required Experience- 5+ years of experience in Availability Management, Service Level Management, or related IT Service Delivery functions.
- Proven track record implementing availability frameworks aligned with ITIL or similar best practices.
- Solid understanding of infrastructure, cloud (AWS/Azure), and modern architectures (microservices, containers).
- Hands-on experience with monitoring, alerting, and analytics tools (PowerBI, Datadog, Splunk, PagerDuty, ServiceNow).
- Excellent communication and analytical skills, with the ability to convey technical insights to non-technical audiences.
- ITIL or cloud certifications (AWS/Azure) preferred.
\n
Please mention the word **SMITTEN** and tag RMTg4LjE2Ni4xMDAuMTkx when applying to show you read the job post completely (#RMTg4LjE2Ni4xMDAuMTkx). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
Tags
manager
system
technical
support
cloud
management
operations
operational
analytics
reliability
health
Apply to job