In the first of this two-part blog, I reported the impact that the Dyn DDoS attack had on the financial services industry. Banks, insurers, credit cards, and others had two waves of impacts on Oct. 21, with many websites clocking in with 60 second page response times, and others with outright failures, not able to service their customers.
In Part 2, we'll dig into some details to better understand the technology risks of financial services websites, and extract some lessons learned for the industry.
After releasing that first blog, I received numerous comments. A few of the comments confirmed that some banks were completely knocked off line for 3 hours. The Dyn attack was not just a depiction of the data shown by Dynatrace measurements, but it indeed had real impact for customers.
The most surprising comment I received was from an industry association that shares cyber security incidents. That group did not know about the widespread impact that the attack had on financial service companies. There is a lesson to be learned here regarding monitoring, which I'll get to in a minute.
Digging into the details - More than just U.S. impact:
An important detail to note that I did not include Part 1 is that the impact was not limited to U.S. firms. Below is an example of U.K. websites that that had impact on Oct 21. The first wave of attacks against Dyn that day, beginning at 7:10 AM EDT, did not impact these UK sites. As reported by Dyn, those initial attacks included their US-West names servers. The second wave of attacks, which were more globally diverse, did impact the UK sites as shown in Figure 1.
Type One Failure - DNS failure for the company's own website:
The first type of failure on Oct. 21 can be classified as failures of the company's own DNS service provider. Figure 2 shows such as example.
This chart is created using the Dynatrace tool, by clicking on one of the red errors in the scatterplots shown in the first blog. I've redacted the bank's name and IP address from this chart, but it clearly shows a DNS lookup failure for the bank's home page.
Type Two Failure - embedded 3rd party content failures:
As bad as it is to have your website down due to your DNS service being unavailable, many would agree it's easier to deal with that rather than this second type of problem - third party content failures.
Websites today, especially home pages, are loaded with tags, analytics, and content of many types from third parties. Each of these serve a purpose for the financial institution, but each also introduces risk.
To better understand this issue, it's first necessary to show how complex bank home pages are today. Figure 3 shows a waterfall chart of a typical bank home page. This bank's home page requires over 120 connections to web servers, with dozens of DNS requests, and hundreds of objects.
I've done my best to condense the chart down, and it's hardly readable. But that's the point: it's very complex. Any of the connections, DNS lookups, or objects could result in a failure. And on Oct. 21, the day this chart was recorded by Dynatrace, 11 DNS failures occurred on this bank's home page.
- Examine risk, not just technology. Bank website A depends upon service provider B, which in turn depends on service provider C, and so-on, in an unending chain of dependencies. Regulators tell banks to examine the risks of their supplier's suppliers. If it were only that easy. In the real world, Bank website A may rely upon 20 service providers, some of which are internal to the bank, or with other banks, or perhaps with a central bank such as the Federal Reserve. Those 20 services providers may in turn depend upon 20 other service providers. Infinite loops of dependencies certainly exist. Any amount of technology examination would not have prevented the type of disruption and impact that the industry experienced by the Dyn incident. Proper risk management and incidence response policies will allow banks to better deal with dependency chain failures in the future.
- Make fewer requests. Nearly ten years ago Steve Souder published High Performance Web Sites, which included 14 rules for speeding up websites. Rule #1 was king: make fewer requests. Not only do fewer requests speed up a web page, but fewer requests also reduce the risk of failures in the dependency chain.
- Use a Content Delivery Network. While we're on the topic of Steve's Sounders 14 Rules, let's continue onto his Rule #2: Use a content delivery network. Not only does the CDN act as a shock absorber against many common attacks, but in Akamai's case, we can include both passive and active logic at our Edge to improve the resiliency of your website. Passively, we will cache DNS responses for up to 2 days such that in cases where the DNS lookup does not respond we have a IP address to use - regardless of the TTL. Actively, we can detect the availability of resources included on the page and configure alternate actions to work around those problems, such as going to an alternate source for the resource, resulting in no impact to your customers.
- Utilize monitoring and setup alerts. The first email that you receive about a problem with your website should not be from your call center. Use external monitoring, whether synthetic, such as Dynatrace, or real user monitoring, such as Akamai's, to spot problems before your customers do.
- Information sharing is Important, but it's not automatic. The Dyn incident highlights a shortcoming of mandated reporting of DDoS attacks. If the mandate is "you must report all DDoS attacks against your site within two hours", what if the DDoS attack is not against you? What if it is against a service provider somewhere in your dependency chain? What if it's not actually a DDoS attack, but a service incident? What if the result is a slow web page, and not an outright failure? The fact that the Dyn impact was not shared among financial services institutions should not be a surprise. It's a complex world.
These are just a few of the many lessons that can be drawn from the Dyn incident. No amount of technology examination or risk management can avoid all problems, and the industry now recognizes this. The recently announced FS-ISAC "Sheltered Harbor" initiative is a big step forward to ensure the resiliency of the industry as a whole. Akamai support this initiative, and we look forward to continue to improve our products and services to serve this industry.
Rich Bolstridge is Chief Strategist, Financial Services at Akamai Technologies