Opinions

Shutdown (re)boot camp: Getting your system and team through unexpected downtime

The 35-day government shutdown that ran from December 2018 into January 2019 not only affected 800,000 federal employees, but also brought business and technology operations across the government to a standstill. Some experts speculate that this may be the “new normal.” In case such events do become the norm, federal agencies must be prepared to handle extended outages and be ready to turn the lights back on at a moment’s notice.

While a shutdown — or any other outage — may well be the greatest test of your modernization effort, embracing rapid development and Agile practices will set you up for survival and success. What follows are a few suggestions for leaders on preparing for and navigating a government shutdown.

Have a plan

Day One of an outage is not the time to plan for unexpected downtime. Developing a well-thought-out contingency plan — including a communications strategy — can save your organization a lot of time and headaches during a shutdown. To do this, you must really understand your system and users. Here are some tips:

  • Know your system’s usage patterns. Measure the volume of transactions that occur on a typical business day, the average week and across the month — paying special attention to any peak periods. Use this information to calculate how an outage that may last days, weeks or even months will impact your system when it comes back online.
  • Assess the impact. What downstream and/or upstream systems will be impacted by your system being offline? What communications need to happen with those stakeholders to prepare them for a possible shutdown? Make sure you’ve got the right communications channels in place.
  • Develop a communications plan, include individuals who may have to step into new roles due to a furlough and outline how the outage affects customers as well as ways to address it. To begin, assume that either upstream or downstream systems will have issues and come up with potential solutions while also monitoring system health and creating resiliency around them.

For example, if your system is used to process benefit claims but will be unavailable during the shutdown, there’s probably a high volume of backlog requests once the system is back up and running. Think of potential issues to help map out next steps, such as notifying users that there will be delays and providing regular status updates after the system is back online.

Got Modernization?

Modern engineering practices are key to surviving an outage and make planning for such events much easier. We live in a rapidly evolving, technology-driven world with complex systems and security risks, and today’s organizations need to rely on Agile practices and a DevOps mindset to survive an unexpected outage. While Agile is designed to help you embrace changes and innovate, DevOps ensures your development teams are constantly delivering new features to production and addressing issues in real-time. Teams adopting Agile and DevOps are accustomed to changes and can easily make the shifts necessary to address an outage.

When an outage occurs, pull out the contingency plan to make sure everyone knows what’s coming and how to prepare — this is where all the planning becomes valuable. The plan should include the appropriate people up, down and across the organization so they can understand what is going on and how to fix it. In addition, have a set cadence and script for status reports as a baseline and pull in the right visibility metrics can also be extremely helpful.

It’s critical to prepare the elastic infrastructure the team has designed and implemented to handle increased volume when normal operations resume. When the switch is flipped back on, everyone needs to be ready and confident. Don’t forget to keep an eye on queues to know if your estimates of transaction volumes match what’s really happening. If not, get prepared before the system comes back online.

Such rapid adoption of practices will be ineffective if it is not well integrated with appropriate business approaches — where DevOps comes in and makes a big difference. DevOps adoption isn’t cheap, quick or easy, but the significant advantages it brings will make these investments worthwhile — there’s a reason why tech giants such as Facebook, Google, Twitter and Amazon have pioneered these approaches.

Don’t forget your customers

Just because your system is offline doesn’t mean your customers are. In these situations, over-communication wins the day. Communicate with stakeholders and customers early and often. Here are a few tips:

  • Establish a communications war room. Setting clear expectations is critical for avoiding unnecessary swirl — people are already frustrated, but you can help by getting everyone focused on clear expectations and what needs to be done to get back on track.
  • Set up auto-reply messages or system notices that are clear, concise and actionable.
  • Designate team members who can quickly address your customer’s technical issues during the shutdown. Customer experience is even more critical during an outage.
  • When you come back online, make sure you communicate to those same stakeholders and customers so they know what’s going to happen next, and by when. This will reduce the calls you receive asking for status updates.

All of this may sound impossible, but we have seen agencies who are doing this right now and doing it very well. You won’t read about them in the headlines, but they are the true heroes of system outages.

All of these suggestions can’t be implemented in the moment of an outage. Successfully navigating an outage takes effective planning and a dedication to true modernization to help your system — and your agency — ensure relevancy in times of change. Putting in the time to innovate today will help you evolve and better navigate the challenges of tomorrow.

Matt Pincombe leads consulting service delivery across federal and private sectors at Excella an Agile consulting and training, IT modernization and analytics company.

Recommended for you
Around The Web
Comments