Akamai Diversity
Home > David Senecal

Recently by David Senecal

In this article we'll review how to handle known bot traffic.

As discussed in the first part, you may not be comfortable serving content to all legitimate bots for various reasons. But even when you're willing to serve content to known bots, several options are available. Just like for unknown bots, you'll have to decide on the response strategy that works best for you.

In part 1 of this series we've discussed the difficult problem of differentiating the good vs. the bad. In this article we'll review how to go about defining a response strategy to manage bots that you think are bad for your business. First thing you'll have to decide is whether you want to serve any content at all to these bots. We recommend you do to keep the bot at bay but of course it depends on your context and what infrastructure you have available.

As you may have heard, Akamai recently introduced a new product, Bot Manager. I've been working at Akamai for close to 10 years and, in my past roles here (Technical Support Engineer, Enterprise Architect), I've had the opportunity to work closely with many customers who had issues with bots. Generally, this was about protecting the site against "bad bots" but also making sure that "good bots" were not impacted by any of the mitigation techniques. 

Anomaly scoring is a better way to detect a real attack

The following is a guest post from Principal Enterprise Architect David Senecal and Principal Product Architect Ory Segal.


Internet security is constantly evolving and it's a challenge for all companies generating online revenue. Not only do they need to constantly reinvent themselves by adding more functionalities to allow their user to do more, but at the same time they need to protect their online transactions.

How to block a threat and not a real user?

One of the key problems in any security solution is how to handle false positives and false negatives - that is, how to avoid blocking valid users, while not missing malicious activity against the system. Web application firewalls (WAF) are no exception.

At Akamai, I have been working with the OWASP ModSecurity Core Rule Set for quite some time and to gain extensive mileage with the system, and the problem we've had with previous Core Rule Sets (CRS) was dealing with exactly this problem.

In some scenarios, a single rule firing on an HTTP request is often not deterministic enough to indicate a real attack. For example, finding the word "script" or "alert" independently in a request is not a good indication that a Cross Site Scripting attack is taking place.

However if you find both keywords together with some special markup characters in-between (something like "<script>alert("xss");</script>") malicious intent becomes more obvious.

Scoring 1.png

Improving the threat detection accuracy

In version 2.x of the CRS, OWASP introduced the concept of anomaly scoring as a better way to detect attacks more accurately. Each rule is built in such a way that it only holds one piece of the puzzle and is assigned a score. As a WAF parses a request through the multiple WAF rules that make up the CRS, it keeps track of the rules that fire and adds the score of each rule to compute the total anomaly score for a request. The WAF will then compare the request anomaly score with an inbound risk score rule threshold. If the score exceeded, the request is more likely to be malicious, otherwise the request is judged to be safe.

At a high level, the principle is simple, but to make it efficient there are some rules to follow:

  • Each rule in the rule set should look for specific keywords or patterns that are typical for an attack
  • Each rule cannot hold all the keywords typically used and found in an attack payload
  • Each rule must be given a score between 1 (informational) and 5 (critical). The score should then be assigned based on the risk

ModSecurity 2.x comes with 2 risk score rules: one that keeps track of all rules that fired during the request stage and another that adds to the score of the rules firing during the response stage. In practice, we discovered that it is very difficult, if not impossible, to find a single threshold that would work across the different types of attacks. The graph below shows the ideal threshold (highlighted in blue) for each type of attack.

Scoring 2.png

Akamai's Threat Research Team went back to the drawing board, and took this concept a step further, introducing attack specific risk score rules (Cross Site Scripting, SQL Injection, Command Injection, PHP Injection, HTTP Anomaly, Trojans and Remote File Include Attack). The result is the new Kona Rule Set that aims to reduce false positives and more accurately detect true attacks.

CRS 2.x in action

In order to put the new Kona Rule Set to the test, and do so by using proper methodology, Akamai's threat research team compared the accuracy of:

  • Akamai Kona Rules
  • A WAF policy running the CRS 1.6.1 ruleset with all rules in deny mode
  • A WAF policy running a standard 2.2.6 CRS rule (Vanilla OWASP CRS 2.2.6)

The testing process used both valid traffic (to measure false positives), as well as attack traffic (to measure false negatives).

We have been running an opt-in beta program with some of our customers to improve WAF accuracy for them.  As a result, we've been able to create a valid traffic sample that includes real world Internet traffic from some of the world's top 100 websites - including large amounts of real world traffic known to cause false positives. Attack traffic was also included from popular hacking tools, exploit tools, and web security scanners. These attack test cases represented 5% of the total sample set.

We consider the following measures:

  • Precision: % of blocked requests that were actual attacks
  • Recall: % of attacks that were actually blocked
  • Accuracy: % of decisions that were true
  • MCC*: Correlation between WAF decision and the actual nature of requests

* MCC is Matthews Correlation Coefficient: http://en.wikipedia.org/wiki/Matthews_correlation_coefficient

The table below shows the results of the experiment.

Scoring 3.png

Why should you use Anomaly Scoring?

The results clearly demonstrate that the policy running the Kona Rule Set blocked more real attacks than any other policy, and overall the Kona rule set is more in sync with reality and better able to detect actual attacks with a lower level of false positives.

It is worth mentioning that the measurements were done against an "out of the box" non-tuned configuration - specific WAF deployments are expected to have even better results using custom rules and more application-specific tuning.

Akamai Professional Services can help you to participate to the Kona Rule Set Beta program, we are always looking for customers to partner on our security research to improve our KONA security suite and reduce false positives even further.


David Senecal is Principal Enterprise Architect and Ory Segal is Principal Product Architect at Akamai.


Slow DoS on the Rise

The following is a guest post from Senior Enterprise Architect David Senecal and Sr. Solutions Architect Aseem Ahmed


Recent years have been very dramatic in security landscape with emerging threats; the application layer is now a more prominent target.  The new (and deadly) Layer 7 attacks called slow HTTP Denial of Service (DoS) attacks are on the rise.  Although they are not as new as they might sound, anything that is not frequently spoken of in the security world is often new!


In my experience, the common perception of DoS is a volumetric attack that occurs quickly, not slowly.  Traditionally, DoS/DDoS attacks have been volumetric, required a large number of clients and could be geographically distributed.  But slow attacks that consume minimal resources/bandwidth from the attacker side can still bring down your Web server. 


Here are the results of using slowhttptest against a vulnerable apache server in our lab environment.  The snippet below shows the tool in action where new connections are established very quickly with the server.  The Web server becomes unavailable after 5 seconds of launching the attack.


SlowPost 1.png

The HTML screenshot below shows the results of the same test.  The tool opens 1000 connections with rate of 200 connections per second, and the server is able to concurrently process only 351 connections, leaving the remaining 649 connections pending. 


SlowPost 2.png


Why is this a problem?


  • These connections look like legitimate user connections.
  • Traditional rate detection techniques will skip them.
  • Existing IPS/IDS solutions that rely on signatures will generally not recognize them either.
  • They require very few resources and little bandwidth for execution.
  • Such attack can bring down a Web server, irrespective of its hardware capabilities.


How do these attack work?


Slow HTTP DoS attacks rely on the fact that a Web server will faithfully honor client requests.  The attacker's motive is to send a legitimate-looking request to keep the server resources busy handling said requests for the longest time possible. If the attacker keeps adding to such long-ended requests, the server can quickly run out of resources.


Slow HTTP Denial of service attacks have different variants, but before we get into the details, let's review the normal HTTP structure and flow:


Request

Headers

POST /account/login HTTP/1.1 CRLF

Accept: */* CRLF

Content-type:  application/x-www-form-urlencoded CRLF

Content-Lenngth: 60 CRLF

Connection: keep-alive CRLF

Host: www.customer.com CRLF

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:22.0) Gecko/20100101 Firefox/22.0 CRLF

Body

email=customer%40account.com&password=mypasswrd

 

Response

Headers

HTTP/1.1 200 OK CRLF

Server: Apache/2.2.22 (Ubuntu) CRLF

Content-Type: Text/html CRLF

Content-Length: 200 CRLF

Date: Fri, 12 Jul 2013 05:31:32 GMT CRLF

Connection: Keep-Alive CRLF

Body

<html>

   <head>

   .....

  </head>

</html>




Identifying and mitigating unwanted bot traffic

All websites connected to the public Internet receive bot traffic on a daily basis.  A recent study shows that bots drive 16% of Internet traffic in the US, in Singapore this number reaches 56%. Should you be worried? Well, not necessarily. Not all bot traffic is bad, and some of it is even vital for the success of a web site. Web sites are also affected differently depending on the profile of the company, the value of its content and the popularity of the site.

 

Defining bots

What are the different types of bots?

  • White bots (good) like search engines (Google, Bing and Baidu) help drive more customers to the site and therefore increase revenue. They also help monitor the site availability and performances (Akamai site analyzer, Keynote, Gomez) as well as pro-actively look for vulnerabilities (Whitehat, Qualys)
  • Black bots (bad) send additional traffic to the site that may impact its availability and integrity. Bad bot traffic can drive customers away from the site, negatively impacts revenue and the web site reputation. For example Hackers trying to bring down a site with a DDoS attack or exposing / exploiting vulnerabilities. Competitors or other actors scrapping a site to harvest pricing information to be used for financial gains.
  • Grey bots (neutral) don't necessarily help drive more customers to the site, nor do they specifically seem to cause any arm. Their identity and intent is more difficult to define, they usually present characteristics of a bot but are usually non aggressive. Such traffic would only occasionally cause problem due to a sudden increase of the request rate.

 

Identifying bot traffic

Dealing with bot traffic can be challenging and pro-active measures should be taken to prevent any negative impact on the site. Monitoring bot activity is key. The one thing that all bots have in common is that they only request base HTML pages, which usually contain valuable information but are also more process intensive for the web server to generate. Bots generally never request any of the embedded objects (images, JavaScript, Cascaded style sheets) just because the client doesn't need to render the full page.

 

Now that we know how to find the bot traffic it is necessary to identify the different types of bots.

  • White bot traffic is usually predictable. It will have a specific header signature and will come from IPs belonging to the companies managing the bot. It is possible to control what these bots can request on the site through robot.txt or through the administration interface of the service managing the bot activity.
  • Black bots header signature will widely vary from exactly mimicking a genuine browser or search engine request to something that will present several anomalies with missing headers or atypical headers being present in the request. Black bots may also send requests at a higher rate.
  • Grey bot traffic can be more challenging to identify since it generally present the same characteristics as black bots.

 

In order to effectively identify bot activity it is necessary to implement and deploy a set of rules to look at the traffic from different perspective. Several features of the Kona Site Defender product can help:

  • The WAF application layer control feature consists of the mod_security core rules set and Akamai common rule set. Some of the rules are specifically designed to look for anomalies in the headers or look for know bot signature in the user-agent header value or combination of headers in the request.
  • The rules mentioned above can be complemented with several WAF custom rules to help identify specific header signature.
  • The WAF adaptive rate control feature can also be used to monitor excessive request rate from individual clients.
  • Lastly the User Validation Module (UVM) can be used to perform client side validation during extreme situation when none of the "traditional" methods seem to help.

 

Mitigating bot traffic

Once bot traffic is identified, the next step is to decide what to do with the black and grey bot traffic. You may decide to just monitor the traffic over time and only take action should the activity become too aggressive and represent a threat for the stability of the web site. You may decide to take action as soon as the activity is identified, regardless of the volume of traffic generated. The type of action taken may vary depending on your business needs:

  • Deny the traffic: this is the default but least elegant solution; client will receive a HTTP 4xx or 5xx response code. This will give the bot operator a clear indication that such action is not allowed on the site and that they've been identified by some security service or device. Bot operators could vary the format of the request and see if they can stay under the radar.
  • Serve alternate content: the content served could vary from a generic "site unavailable" page to something that looks like a real response but only containing generic data. This strategy may slow down the bot operator and keep them in the dark as too why they cannot access to the data they want.
  • Serve a cached / stale / static / version of the content: This is the best strategy of all but not always necessarily possible to implement, some content just cannot be cached or stored as static data on an alternate origin, because of compliance concerns or its dynamic nature.  It could potentially take the bot operator some time to realize the data they are getting is worthless, an attacker running a DDoS against the site would also get discourage and move on to a different target.


David Senecal is senior enterprise architect at Akamai.  Patrice Boffa is a director of global service delivery at Akamai.