Akamai Diversity

Akamai Security Intelligence & Threat Research

Domain Reputation System: building a large graph to generate real-time threat intelligence

Why do we need a Knowledge Base system

Let me start with an obvious statement: the Internet generates a lot of data. Every day we, Akamai's security research teams, see billions of DNS queries, millions of domains, and who knows how many IP addresses. This is an exciting thing, especially if you're a data scientist.

In the past year, we have taken on a "simple task": to map the "dark side of the Internet" - the place where malicious activities are born, die, and constantly change. Since we have a significant volumes of DNS traffic data, we believe that taking on this epic task was feasible.

Seeing all the data in the world does not provide all the value in the world, unless we could tame the data. Data taming means making sense of the data, and that required us to understand the relationships between domains, hosting servers, name servers, CNAMES, AS, and  learn how these relationships change from day to day. With this information, we can, finally identify which of these relationships are used for malicious activities.

To achieve these goals, we created Domain Reputation System (DRS), a huge DNS-based knowledge topology graph. DRS currently includes over a billion nodes and edges,  and this number growing every day. Using this knowledge base, we are able to generate real-time detection of many sorts of threats, including phishing, botnet and malware-related attacks.

How DRS works

The cyber-ingredients that go into DRS include DNS data we receive from customers, whois registration info, and 100+ feeds we obtain from 3rd party vendors. Cooking with these ingredients presents a real challenge, given the high volume of read-and-write operations that  need to happen simultaneously. Approaches such as hadoop-based systems or graph databases, would generate many bottlenecks. We implement DRS in C++, which has different threads for ingesting, aggregation, API and purging data.  

DRS ingests hundreds of thousands of events per second, and generates a real-time graph with billions of nodes and relationships. The C++ multithreaded graph engine supports a quick drill-down to any 'rabbit hole' that can help security analysts and researchers perform an effective investigation. As shown in the figure below, although these names all share networks features such as IP address and Cname redirect and have similar string pattern, only a few of them are identified as phishing by 3rd party vendors. In spite of that, DRS graph can propagate a notion of "maliciousness" through the interconnected nodes, using pattern-based matching and autonomous clustering.

DRS nodes and edges example.

On top of generating an enormous graph, DRS supports a rule engine and pattern-based matching to cover the detection of specific threats. The engine currently includes over 30 different use cases that  are executed every 30 seconds. New rules can be manually defined using a set of features such as hosting server IP, name server, cname redirect, name regular expression pattern, last seen timestamp, and so on. The end goal is to build an autonomous, AI-driven rule engine in DRS, in which the system itself can suggest dynamic rules, test and validate them. We are on track to make this happen within a few months.

Over the past year we have compared the output generated by DRS to threat intelligence feeds we receive from 3rd party providers. In the majority in cases, when a name is detected by another vendor, 3rd party provider detection is made significantly later. This enforces our belief that DRS is the way forward to real-time generation of threat intelligence. It is also noteworthy that many names in the same cluster are not identified at all, but we can because we are able to see many more domains in the long tail than any 3rd party providers.


Examples for DRS early detection:


DRS detection time

Earliest 3rd party vendor detection

Early detection interval


2018-02-02 12:48:41

2018-02-03 10:25:18



2018-02-19 00:30:16

2018-03-02 19:29:28

11 days, 18:59:12


2018-02-21 07:10:25

2018-03-02 19:29:28

9 days, 12:19:03


2018-04-18 20:23:03

2018-04-18 22:27:58



2018-03-08 09:36:00

2018-03-08 16:19:48




2018-03-08 05:08:55



DRS in Action

With more than 100 billion DNS queries per day, DRS creates the most comprehensive real-time picture of the Internet. This picture opens a window to many different activities that happen in any given time across the world. It can show how legitimate infrastructure looks and behave (CDN activity, for instance), or how malicious structures tend to form (in some fast flux or crypto-mining activities, for example).

The graphs below provide examples for different types of DRS generated graphs. They show the bipartite relationship between a domain name and the resolved IPs -  where the nodes are names and IPs, and the edge are the mappings of names to IPs.

Mostly Unharmful

We start with some examples for mostly non-malicious activity which DRS detects. This tool can be used to explore the internet as a whole, and not just to identify malicious activities.  

CDNs such as aocde.com or alikulun.net are resolved to many different IP addresses


The image above shows how a Content Delivery Network (CDN) looks like in DRS. A CDN provides infrastructure resilience benefits that looks like fast flux, where IP addresses are swapped frequently using a combination of round-robin IP addresses and a very short Time-To-Live (TTL) DNS Resource Records.


Screen Shot 2018-03-17 at 4.19.44 PM.png

Frank Nice VPN domains

DRS recently discovered a cluster of more than 900 '.us' domain names registered by the same fake entity ( "Frank Nice") that  resolved to more than 28 thousand unique IP addresses around the world. A further analysis of this graph revealed that these .us domains were anonymized VPN services, used in AnchorFree's HotSpot Shield VPN, where exit IPs are assigned per client. Although it definitely looks like a large-scale fast flux or a botnet, a VPN service can be used for either malicious or non-malicious activities.

A cluster of crypto-mining domains and their intra-relations (dns-seed.dash.org, dns-seed.dashdot.io, dns-seed.masternode.io, dns-seed.koin-project)

Unlike centralized mining pools, Dash p2p communities discover their peers by using DNS seeds. To discover peer nodes upon startup, the client issues DNS requests to learn about the addresses of other peer nodes. The DNS seeds bootstrapping must be run by entities that have some minimum level of trust within the Dash community. Through DRS, we can see how different IP addresses or peers are multi-tasking and  simultaneously working on mining different types of cryptocurrency projects, such as dash, paccoin, koin, reddcoin and others.

Malicious Activity: Malware Fast Flux

While some fast flux domains are legitimate (as we saw in the cases of CDNs or VPNs), many are malicious. In the example below, we show a cluster of subdomains of 000webhostapp.com that were used to host different malware files. 000webhostapp.com is a free web hosting services, where the hosted subdomains resolve to a shared pool of IP addresses. A high percentage of names hosted there are malicious, as identified by many security vendors. Below is an excerpt of names in the cluster and their name-IP mapping graph:






























Fast Flux of malware downloading sites hosted on 000webhostapp.com

Malicious Activity: Fast Flux Phishing

The constant stream of Adobe Flash updates has always been a serious security headache. Using our Domain Reputation System (DRS), we recently discovered hundreds of new core domains per day, each with different subdomains that trick users to download and install the latest Adobe Flash security patches.

The chart below shows the traffic to a set of 12 fake Adobe Flash update domains over a period of a few weeks. The cluster itself was detected through a combination of a shared hosting IP and common naming conventions. The pattern of traffic per phishing site in the cluster is made up of  high and quick peaks, which is a very common pattern in phishing attacks. Each domain peak represents 40 to 60 unique victims (or potential victims), followed by a second, smaller spikes of 10 or less victims.


Traffic to Fast Flux Phishing Sites



The cartography of earth took thousands of years of development; we expect that the cartography of the internet to be shorter. And we hope that DRS contributes to it, especially when it comes to mapping security threats.

The polymorphic nature of malicious threats today requires real time detection and an agile defense. That was the initial reason we created the Domain Reputation topology graph, and tamed it to continuously identify threats. As of today, it detects and blocks hundreds of thousand of domains associated with different threats, shortly after they first appear. We recently introduced an API for DRS, and opened it internally to be used by Akamai teams.

Leave a comment