Learning From CrowdStrike—the Board’s Role in a Major IT Outage

By Paul Connelly

07/24/2024

Cybersecurity Third-Party Cyber Risk Online Article

Many of us awoke last Friday to our phones lighting up with messages about Windows devices around the globe crashing with the dreaded “Blue Screen of Death.” As a former head of cybersecurity at the White House and a Fortune 100 health-care provider, I still experience immediate dread from predawn text messages.  I was conditioned to jump into action as a chief information security officer. Now as a board member, I asked: How do I add value?

Whether from a faulty software update, cybersecurity attack, tornado, or backhoe cutting through a critical cable, broad information technology (IT) outages can have massive impacts on business operations, revenue, and reputation. In this case, CrowdStrike, a cybersecurity firm, pushed an update to their Falcon Endpoint Detection and Response product that caused Windows devices that applied it to crash. It severely impacted emergency systems, hospitals, airlines, banks, hotels, and other business and government entities. Although CrowdStrike quickly identified the problem and deployed a fix, deleting the update and restoring systems requires significant manual intervention on most devices—making restoration of service time-consuming for most IT teams, especially with distributed workforces.

Boards are responsible for helping shape their company’s strategy and providing oversight of risk management. There are numerous opportunities to support management in an event of this nature while maintaining a “noses in, fingers out” role.

During the crisis: Support management and monitor how the leadership team, workforce, and partners respond. 

  • Leadership: Is there a clear lead and coordinated response following a preestablished plan?
  • Workforce: Are the employees trained and informed on how to respond?
  • Endurance: Can the company sustain the response for an extended period?  In a prior role, I saw that our hospital teams were prepared for a three to four day hurricane event or short-term system outage, but the emerging threat of a ransomware attack taking down IT systems for several weeks required us to retool our response plans from a sprint to a marathon.
  • Communications: Is management transparent, timely, and striking the right tone in communications with the board, employees, customers, regulators, and community?
  • Risk management: Are legal, compliance, internal audit, and other risk management teams engaged and involving external specialists where appropriate? Is the company responding in an ethical and legal manner?
  • Partners: Are your technology and business partners working in good faith with your team?

Long-term: Focus on prevention and resilience strategy.

  • Response: Did the response plan work effectively? Did management conduct a postmortem to find improvements?
  • Third-party impacts: Did affected third parties impact your business? Each “yes” needs follow-up risk mitigation as part of business continuity planning.
  • Impacts to your external partners:  Did your systems affect customers or other business partners? If so, how do you reduce that risk and mitigate the potential liability?
  • Technology architecture: Is the company following a framework to implement the right prevention and resilience measures? Do you have material dependencies on single products or vendors? For example, Mac and Linux hosts were not impacted by the CrowdStrike event. Can you diversify elements of your IT systems to reduce single points of failure?
  • Ability to flex: Does the company have retainer agreements with external partners for surge IT support, data forensic investigations, external legal counsel, and other likely needs?

Boards should also look in the mirror and evaluate their preparedness in this area. The board has a fiduciary responsibility to be informed and thoughtful in their oversight. Directors should consider these questions when assessing potential skill gaps and areas for improvement in their oversight: Does the board devote adequate time to cybersecurity, artificial intelligence, and other technology risks and opportunities? Does the board have the expertise needed to go beneath the surface level in evaluating management’s strategy and activities? 

This was not a cybersecurity attack; it was an error in a software update, but it had a similar impact. Just as many companies learned from the Change Healthcare cybersecurity event earlier this year, if you put all your eggs in one basket (vendor, product, or system) and that basket fails, your business is in trouble. Technology and cybersecurity programs today must focus on building resilience just as much as they have traditionally focused on preventative measures because recent events have shown it is when, not if, an event like this will happen again. 

Even if your company was not affected this time, as a board member, your focus on the resilience of company and third-party IT systems can help prevent a next time.

Robert Peak

Paul Connelly is an independent director on the boards of Fortified Health Security and Dismas. He is a technical advisor to the boards of the United Network for Organ Sharing and the US Organ Procurement and Transplantation Network.