Accenture Federal Services

Washington, DC

Posted on: 05 February 2026

Back Apply to job

Experience

n/a

Work

n/a

Employee Type

n/a

Salary Range

n/a

Senior Site Reliability Engineer Senior Manager

At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger and safer and life better for people. Our 13,000+ people are united in a shared purpose to pursue the limitless potential of technology and ingenuity for clients across defense, national security, public safety, civilian, and military health organizations.

Join Accenture Federal Services, a technology company and part of global Accenture, to do work that matters in a collaborative and caring community, where you feel like you belong and are empowered to grow, learn and thrive through hands-on experience, certifications, industry training and more.

Join us to drive positive, lasting change that moves missions and the government forward!

You Are:

We are seeking a Senior Site Reliability Engineer (SRE) with deep expertise in building and maintaining reliable, scalable systems and a passion for optimizing the performance, reliability, and efficiency of technical infrastructure. The ideal candidate will have a strong background in site reliability engineering principles, extensive experience with automation, and a proven ability to collaborate across teams to ensure seamless service delivery.

The Work:

• Design, build, and maintain reliable, scalable, and high-performance infrastructure and services to support business needs. • Implement and advocate for SRE best practices, including automation, CI/CD pipelines, monitoring, and incident management. • Collaborate with cross-functional teams to develop systems that meet high availability, performance, and reliability standards. • Drive incident management processes, including root cause analysis, mitigation strategies, and long-term preventive measures. • Establish, monitor, and refine service level objectives (SLOs), service level agreements (SLAs), and key performance indicators (KPIs) to ensure systems adhere to reliability and performance targets. • Automate repetitive tasks to improve operational efficiency and reduce manual intervention. • Build and maintain robust monitoring, logging, and alerting systems to ensure visibility into system performance and reliability. • Provide technical mentorship and guidance to team members, fostering a culture of knowledge sharing and continuous improvement. • Act as a technical leader by driving solutions to complex challenges, ensuring alignment with organizational goals. • Prepare and deliver performance and reliability reports to stakeholders, offering insights and recommendations for improvements.

Here's What You Need:

• Proven experience in site reliability engineering or a similar role, with a focus on application and infrastructure scalability, reliability, and performance. • Strong knowledge of ITSM principles and incident management processes. • Expertise in automation tools, scripting, and infrastructure-as-code (IaC) technologies. • Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). • Experience with cloud platforms (e.g., AWS, Azure, GCP) and container technologies (e.g., Docker, Kubernetes). • Strong analytical and problem-solving skills, with the ability to troubleshoot complex systems. • Excellent communication and collaboration abilities, with a focus on cross-team partnerships. • A passion for continuous learning, innovation, and driving imp

Please mention the word APPRECIATES and tag RMTg4LjE2Ni4xMDAuMTkx when applying to show you read the job post completely (#RMTg4LjE2Ni4xMDAuMTkx). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.