By Yohai Einav, Hongliang Liu
It's been 18 months since Mirai entered our lives, and, unfortunately, we expect it to have a perennial presence in our cyber-world for years to come. If we look at the big picture, all indicators suggest that the Mirai problem (and its descendants) is just going to increase, with the growing number of IoT devices in the world and the improvement in IoT hardware (which makes them a more enticing opportunity for attackers - better computing power means a potential for more advanced attacks) being two primary reasons.
This makes Mirai research more urgent, and subsequently, makes DNS-based security more important. There are very few points in time when you can stop Mirai, and blocking its C&C communications in the DNS layer is one of the most effective ways (blocking C&C communications disrupts the bots' ability to receive commands and turn them into less-harmful zombies).
Since our team is now an integrated part of Akamai - and hence the new akam:AI Research team name - we have the pleasure of obtaining data collected through the company's security platform (thank you Akamai SIRT!). This new data synergy allows us to examine domains that the company suspects of being malicious Mirai C&C's, and to see if our unique engines and data could substantiate and/or augment this suspicion. This work is the start of collaboration with our peers within Akamai, and we anticipate expanding it to all parts of the company's security and threat intelligence community.
Data & Traffic Analysis
The data we received from Akamai's Infosec honeypots included a list of over 500 suspicious Mirai C&C domains, collected from January 23-25, 2018.
The first step of our analysis was to see what footprint these domains had in the DNS data around the reporting period. The following graph shows the traffic to selected domains:
The first observation from this graph is that, yes, this is bot-generated traffic. Similar cycles of queries, in both frequency and volume, strongly suggest an automated query process, a symptom of a botnet. The number of bots, or distinct Mirai-infected devices, was under 600 in our sample dataset. Most of the bots queried a single Mirai C&C, while some queried over 10 different C&C domains. This was their hourly distribution:
After reviewing the traffic, the next step was to identify the correlation between the domains in the dataset. A strong correlation validates the conclusion that the domains are indeed being used by the same botnet. It also allows us to group the domains by clusters, where each cluster represents a certain group of infected IoT devices. This can later on help pinpoint the attacker.
To perform the correlation we used our Domain2Vec model (further reading about the model: https://www.botconf.eu/2017/augmented-intelligence-to-scale-humans-fighting-botnets), which generated some interesting results:
In the graph above, each node (in yellow) is a Mirai C&C domain (with query type A), and the edge weight is the correlation score between two nodes. As can be seen here, all the domains in the graph have strong correlation (0.93 and above) to each other. This shows that they represent a cluster, and belong to the same group.
Other than identifying groups, the clustering process helps us exclude outliers - domains which don't have a high correlation score to any node; this means that they are likely not to be Mirai C&Cs (but doesn't mean they're not malicious). Through the correlation and clustering process we were able to identify and exclude several domains not related to Mirai; here are examples for some of the outliers:
Server behavior clustering, and its related algorithms, is also discussed in the recent Akamai State of the Internet report: (https://www.akamai.com/de/de/multimedia/documents/state-of-the-internet/q2-2017-state-of-the-internet-security-report.pdf).
Augmenting the Dataset
To make things even more interesting, we extended the dataset by looking at different domains queried by the Mirai bots around the same time they queried the Mirai C&Cs. As you may expect, there was once again strong correlation between the additional domains. This indicates they belong to the same botnet, and being controlled by the same bot herder. Here are some top additional correlated domains:
[note that sss.snicker.ir is actually a subdomain under one of the known C&C's; it was not detected through the honeypot, yet the clustering process was able to augment the intelligence around it]
A noteworthy point about these additional domains (which, again, were not detected in the honeypots) is that many of them are known to distribute the Bad Rabbit ransomware. This ransomware is believed to have been created by the same group that brought us Petya; Bad Rabbit is therefore considered to be a variant of the Petya ransomware. (see: https://www.riskiq.com/blog/labs/badrabbit/)
This dual Mirai C&C-Bad Rabbit usage, by the same bots, makes us wonder about the evolution of IoT botnets. If until today, the nearly-exclusive activity of Mirai has been launching DDoS attacks, it now appears that compromised IoT devices are being evolved for more sophisticated activities, such as ransomware distribution or crypto-mining. So, not only do more IoT devices get infected, but there are more malicious activities they can perform once infected.
An IoT device infected with Mirai provides the user with very little visibility or indication of compromise, yet it is as malicious as any other device. Since the user can't visually see the threat, it is our job, as security professionals, to analyze the network traffic, identify the IoT threat for our users, and block it. Following the research described here, we added dozens of C&C domains to our security block lists.
We believe that this exercise is a first step in applying AI tools to support the greater security good, and a trend that is only going to continue and gain momentum in the near future. It is also an exercise that demonstrates how data collaboration can makes sharing parties stronger -- not only is the sum greater than its parts, but it also streams back and enhances the (standalone) parts behind it.
Selected Indicators of Compromise (IoC)
Mirai C&C domains in original dataset:
Non-Mirai C&Cs in original dataset:
Additional augmentation C&C domains (January 29):