Akamai Diversity

The Akamai Blog

Serving at the edge: Good for performance, good for mitigating DDoS - Part I

Akamai's Chief Security Officer Andy Ellis recently commented on large DDoS attacks and how "size" can be misleading. In that post, Andy notes if you have more than 300 Gbps of ingress capacity, then a 300 Gbps attack is not going to hurt you too much.

He's right of course. However, total ingress capacity is only part of the equation. Also important are the type of attacks you're facing and your "ingress" configuration. I'd like to dig a little deeper into these two topics, and explain how a widely distributed infrastructure is useful for both improving performance and mitigating attacks.

Not surprisingly, I've used a generic CDN example to set the stage, but most of the concepts here apply to any large backbone network with many peering and transit links. Because we are talking about CDNs, we should first ask why CDNs push traffic to the "edge", and even before that, what is the "edge"?

Why serve at the edge?

On the Internet, the "edge" usually refers to where end users ("eyeballs") are located. It is not a place separate from the rest of the internet. In fact, frequently the edge and what most people consider the core are physically a few feet from each other. Topology matters here, not geography.

The reason CDNs want to serve at the edge is it improves performance. Much has been written about this, so I shan't bore you here. The important thing to realize is all CDNs distribute traffic. However, when CDNs were invented, the distribution was not designed to mitigate attacks, it just worked out that way.

And it worked out well. Let's see why.

Ingress capacity: How much is enough?

To set the stage further, we are going to discuss a "typical" DDoS (as if there were such a thing!) and possible mitigation strategies, not a specific attack.

The first and most obvious mitigation strategy is what Andy mentioned in his post: Have enough ingress capacity to accept the traffic, then drop or filter the attack packets. This begs the question of what "ingress capacity" means. If you have a single pipe to the Internet, making that pipe bigger is the only answer. While plausible, that would be very difficult, and very, very expensive to do with the size of attacks seen on the 'Net today.

Now, suppose you have many ingress points, such as a CDN with multiple links and nodes. Do you need to ensure each and every point is large enough for the worst-case DDoS scenario? Doing so would be insanely expensive and, frankly, nearly impossible. Not every point on the Internet can support connections into the 100s of Gbps.

Fortunately, the first 'D' in DDoS is "Distributed", meaning the source of a DDoS is not a single location or network, but spread widely. And because of the way the Internet works, different attack sources will typically hit different CDN nodes and/or links.

The chance of a distributed attack hitting all the same node is very, very small. Of course, the opposite holds as well - the chances of an attack having a perfect spread over all points is essentially nil. As such, you cannot just divide the attack by the number of nodes and assume that amount is the maximum required bandwidth per node to survive an attack. How much more capacity is needed per node depends on the exact situation, and it cannot be predicted in advance. This leads us to our next topic.

Node capacity: How much is enough?

Trying to size each node properly for an attack is an art, not a science. A black art. A black art that is guaranteed to fail at some point.

Of course, everyone still tries. They attempt to estimate what attack capacity is needed based on things like where a node is, what customers are on the system, how much connectivity costs, and several other factors. For instance, a node in Düsseldorf serving a small DSL network exclusively probably does not need as much attack capacity as a large node in Silicone Valley serving several Tier-1 providers combined. Engineers pray to the network gods, sacrifice a router and maybe a switch or two in their names, make a plan, implement it, and... pray some more.

But pray as they might, the plan will fail. Sooner or later, some attack will not fit the estimates, and a node will be overwhelmed with attack traffic. Don't let this get you down if you are making your own attack plan, remember the same is true for everyone. Not only is it impossible to predict how large an attack is going to be, but as mentioned above, it is also impossible to predict what percent of the attack will hit each node. Worse, since unused attack capacity is wasted money - a lot of wasted money - CFOs tend to push for less rather than more, making the plan that much more likely to fail.

The problem with an overloaded node is it doesn't matter how many nodes you have, if one is overloaded, any traffic going to that node will be affected. This means if your link to London is overwhelmed with attack traffic, it doesn't matter how many nodes you have in Tokyo, Sydney, San Jose, etc., your users going to the London node are suffering.

As a result, while CFOs push for less, engineers push for more.

In Part II, I'll cover in greater detail why a distributed infrastructure, such as the Akamai Intelligent Platform, is ideal for mitigating even the largest of DDoS attacks.


Patrick Gilmore is a Chief Network Architect at Akamai