Akamai Diversity

The Akamai Blog

How Nominum Data Science Thwarts Cybercrime Through Industry-Leading DNS Data Analysis

Nominum, now part of Akamai, inaugural security report published by its Data Science team, Data Revelations: Fall 2016, includes an analysis of some of the largest threats that are impacting organizations and individuals, including ransomware, DDoS, mobile malware, IoT-based attacks and more. Since DNS is the launch point for over 90% of cyberattacks, it offers a great vantage point from which to examine, understand, thwart and proactively prevent threats1. With industry-leading research experience, and by applying machine learning, artificial intelligence, natural language processing, neural networks and more, Nominum Data Science is able to locate, analyze, prevent and predict some of the most sophisticated and dangerous cyberthreats ever to hit the internet.

In order to provide proactive protection, Nominum Data Science analyzes daily, weekly and quarterly data to predict the next steps cybercriminals will take. A relatively quiet period such as a diminishment in DDoS attacks does not necessarily mean that a certain threat has been resolved, or that all is well with the network. We know that cybercrime is too lucrative to simply fall off the map. We must understand the great lengths cybercriminals will go to in order to disguise their footprints. Therefore, this silence leads us to question more, rather than less, to search for faint signals that could indicate an impending attack.

The science of detecting threats using DNS data employs a variety of proprietary data analytics tools and algorithms:

  1. Anomaly detection engine: Identifies anomalies in the data by comparing each queried domain to previous domain behaviors, or by identifying newly generated domains.
  2. Domain Reputation System (DRS): A large-scale, comprehensive knowledge-based system for domain names and their related entities. This tool detects subtle links between domains, hosting servers, name servers, WHOIS information and blacklist data, and measures the maliciousness of each domain based on its relationships
  3. Correlation engine: Identifies subtle relationships between domain names and the clients that query them. This tool is specifically used to detect and cluster families of malicious domains.

Nominum Data Science Thwarts Cybercrime

Taking the temperature
In order to gauge the overall threat level for a certain time period, the Nominum Data Science team analyzes the relationship between malicious domains, infected clients and the queries made to these domains, using an extensive set of data. This approach produces a big picture view and helps to identify where the main threat lurks and where the next innovation is needed. This is not "perceived risk," but the actual risk as told (or measured) through big data analysis.

The trend seen by Nominum Data Science over the past six months has been clear and consistent. Starting around mid-year 2016, the rate of malicious traffic has tripled. Since a high percentage of malicious queries are made to Command and Control (C&C) servers, there has also been a formidable increase in the number of malicious domain names. These malicious domain names are frequently used as botnet communication and control points. A deeper investigation of these results reveals high botnet activity during the last period, specifically from Necurs--the largest botnet existing today.

That's no domain for a new barbershop
The DNS layer provides excellent visibility into new domains--defined as domains that are seen for the first time in DNS queries. Nominum Data Science tracks new domains hourly, regularly examining how many are generated each month.

The security-related significance of new domains comes from understanding the reason new domains are created. Approximately five million new domain names surface every day. We don't assume five million new people or companies decided to launch a new website. Rather, we see many of them as new domains that are generated by machines--specifically via Domain Generation Algorithms (DGAs)--that are purpose-built to serve as C&C servers for malware. Rarely are such domains created for a new barbershop down the street.

The statistics for known vs. less-known (or suspicious) new domains are surprising. By analyzing the number of domains generated each month and the number of queries against those domains, Nominum Data Science can gauge how many domains are generated for malicious intent. As shown, 75 percent of all domains for the six-month period had only a single query against them. A few of them are accidental typos, but the majority are likely to be created for malicious intent--either to take an active part in cybercrime, or to deceive security organizations fighting cybercrime (the probability that a domain such as "3isgarauile.tk" is a typo is quite small). This demonstrates one of the unique qualities of DNS compared to other network security methods. With as little as a single transaction, it can tell with a high level of confidence that something malicious is happening. The other 25 percent of new domains that have more than a single query are also mostly malicious. Here, our algorithm can connect unrelated new domains into clusters, with assurance.

Since these domains are likely malicious, top providers trust Nominum to block access to such domains until they are verified as legitimate. On average, Nominum blocks nearly 100 million queries daily. Doing so provides a sound security measure to avoid exposing provider networks to 'unknown' threats.

Click here to read the full report.