Akamai Diversity

The Akamai Blog

Singapore Cable Cut Demonstrates Power of a Highly Distributed CDN

On February 29th, marine cable cuts in and around Singapore caused significant connectivity issues that now, over a month later, have not been entirely resolved. * This has caused problems for many online businesses that operate in and around the Pacific Rim; however, Akamai customers have had a different experience than everyone else.

Before we continue, let's back up a second for some context. If you've read our blog, read about our unique architecture, or listened to any of our speakers, you've likely heard from Akamai about the importance of a "highly distributed network" and the power of Akamai's Intelligent Platform. 200,000+ servers, in 100+ different countries and 1400+ different networks. So what? What does that really mean? In this blog, we'll show some data related to the Singapore cable cut to give you a better idea of the power of our platform in the face of significant network disruption. 

Below is data from Akamai's Site Analyzer showing the difference in data delivery by standard Internet Border Gateway Protocol (BGP) routing versus data delivered by Akamai both before and after the cable cut. In particular, it shows traffic flowing from datacenters in Japan to measurement agents in Singapore. In these figures, the red line indicates traffic attempting to connect via the default BGP-specified route, while the blue line represents connections made with routes optimized by the Akamai's Intelligent Platform. It is important to note that this graph does not reflect caching; but is purely reflective of the response times for traffic routed across the Akamai platform using SureRoute. Therefore, you see similar response times until the moment of network instability.

Cable Cut.png

The cable cut took place on February 29, which is when you see the blue and the red lines diverge.  As you can see from the red line, default BGP connections suffer significant latency as measured by average response times. The spiked nature of the graph is reflective of the fact that over BGP many packets simply never arrive at their destinations, causing complete failures to connect - and performance measurements to be wildly inconsistent. Meanwhile throughout the event there is no change whatsoever in latency and performance experienced by data optimized and delivered by the Akamai Intelligent Platform (the blue line). 

Why is this?  Why did Akamai's network continue to successfully deliver high performing content while the rest of BGP failed? Was it because of caching? Again, no: these figures do not reflect caching. There are two reasons for Akamai's performance: Akamai's highly distributed CDN architecture, and the power of the software embedded in Akamai's Intelligent Platform.

Breaking this into individual pieces - let's discuss why highly distributed mattered. Akamai's network is architected such that we have servers in as many places as our customers are likely to want to reach audiences in an effort to get as close to those audiences as possible. Currently we sit within one network hop of 90% of the Internet's users. This is in stark contrast to the centralized model or "super POP" architecture of other CDNs, in which "edge servers" sit in centralized Internet backbones rather than in a distributed fashion that comprises the "edge" closest to real users. As a result, delivering content using these more centralized models to users often requires going through congested peering points. Since BGP is not a performance-based protocol, it does not always provide the lowest-latency routes, nor can it respond quickly to outages, errors, or congestion - as was the case with the Singapore cable cuts. In fact, in this case what we see is not only the failure of BGP to successfully route around the immediate outage, but weeks later its inability to overcome the congestion associated with the remaining routes that *do* exist.  The highly distributed positioning of Akamai's servers allows its software to take advantage of both geographic- and network-proximity to end users to effectively route traffic around not only outages, but also points of congestion after outages have been identified.

This brings us to reason #2: Akamai's Intelligent Platform. When we refer to the Intelligent Platform, we're not just talking about the servers, but rather the proprietary software built into our network. This software, in this case SureRoute, provides the intelligence necessary to resolve network trouble spots, and specifically has built-in testing methodologies to respond (1.) rapidly to sudden and unexpected outages, and (2.) over time to accommodate evolving points of congestion and ongoing interruption. In the former case, the software is so powerful and the response is so rapid that when you see a service interruption, such as the example of the Singapore cable cuts, there isn't even visual evidence that we are "responding" to the event. In the latter case, the software is able to continue doing its job long after the initial interruption, in response to an evolving set of challenges that represent the "new norm." Instead, performance and latency rates stay consistently positive and service is uninterrupted. The network, as you can see from the blue lines in the chart above, just continues to operate as normal. 

This is why we keep talking about the power of Akamai's platform, and why the world's top retailers, media companies, financial institutions and more rely on Akamai to move their businesses Faster Forward.

* http://www.theregister.co.uk/2016/03/03/linode_says_several_cables_serving_singapore_cut/