Things I wish I knew about AWS WAF - Bot Control
AWS WAF might be your first layer of defense for attacks on websites hosted on AWS. While WAF does its best at blocking web attacks, it doesn’t stop web abuses - like bot attacks involving API abuse. For example, submitting comments on pages, credential spraying, OTP bruteforce/resend, etc.
The reason for this is pretty intuitive. WAFs usually don’t have context on what’s an abuse, how the resources can get abused, and how to prevent it. So it’s up to the security engineer setting up the WAF to write custom rules to prevent such abuse.
Do you think AWS WAF’s ratelimiting feature can help block such abuses like OTP bruteforce/resend? Pause now and think about it.
In most cases I’ve seen, the ratelimiting feature can’t block abuses because of its caveats. You can ratelimit to a minimum of 100 requests in 5 minutes. Not lesser than that. If you are talking about abuses like bots spraying stolen credentials or bruteforcing passwords over multiple IPs, 100 requests per 5 minutes can still do some noticeable damage.
After such an API abuse incident, I wanted to check if AWS WAF’s Bot Control feature can help stop such abuse. There weren’t any useful blog posts out on the internet about the feature. So, I plan to write a blog post about it.
This blog post walks you through the bot control feature, its pricing, how it works, and the results after enabling it.
The first few sentences of the bot control’s documentation have explained it wonderfully:
AWS WAF Bot Control gives you visibility and control over common and pervasive bot traffic that can consume excess resources, skew metrics, cause downtime, or perform other undesired activities. With just a few clicks, you can use the Bot Control managed rule group to block or rate-limit pervasive bots, such as scrapers, scanners, and crawlers, or you can allow common bots, such as status monitors and search engines.
Bot control comes with a subscription fee of $10.00 per month per Web ACL. Additionally, there is a request fee of $1.00 for every million requests inspected.
You can read more about the pricing here.
It’s very simple to enable Bot Control.
Goto WAF dashboard, click on the WAF ACL, head over to Rules, click “Add managed rule groups” under “Add rules” and you will see “Bot Control” in AWS managed rule groups (it’s the only paid rule group under the category 😉 ).
Toggle “Add to web ACL,” set its priority, and save the changes.
Voila, you just enabled bot control.
If you deploy AWS WAF web ACL rules using JSON files, enabling it is even easier. Just copy-paste the below at the bottom of the ruleset.
Note: The above rule enables Bot Control in Count mode. It doesn’t block any request upon enabling bot control (you will understand why I emphasize this later).
WAF rule decisions made earlier in the rule processing order decide if bot control kicks in. For example, if a rule explicitly blocks a few requests for certain criteria, then the requests get dropped immediately. It doesn’t get checked for any bot signals. Every non-blocked request that reaches the bot control rule, can get labeled.
To see this feature in action, I enabled it on a production web ACL and set the priority as 10000 (i.e., the last rule to be processed).
The production web ACL usually gets ~10 million genuine requests every week. I intentionally did it so I could understand the bot control functionality better.
Here’s what I found.
The requests originating from popular web browsers didn’t get labeled. So as per bot control, these are genuine requests and not bots.
Requests from Android/iOS apps, curl requests, python/golang library requests, etc get labeled with their respective bot categories.
Labels for requests from the Android app:
Labels for requests from random internet bruteforcer hosted on DigitalOcean:
Labels for webhook callback requests from DigitalOcean (with valid browser User-Agent):
From the above examples, you can see the labels have at least one bot signal. They can have bot categories and names to further differentiate the bot.
Looking at all the requests and their labels over the week, I found its logic depends on a couple of factors: User-Agent and Source IP address.
The bot control signals can be summarized based on these two factors:
AutomatedBrowser- if User-Agent belongs to automation frameworks (like Puppeteer, Headless Chrome, etc)
KnownBotDataCenter- if the source IP address is part of bot datacenters or cloud service providers
NonBrowserUserAgent- if User-Agent doesn’t belong to a valid browser or if there’s no User-Agent in the request
Bot Control does a decent job of labeling requests. You can then have a WAF rule to take custom action based on the label. For example, if the request labels have
signal:NonBrowserUserAgent but don’t include
name:okhttp (requests coming from an Android app), then block or rate limit it. Or even send a captcha challenge.
Let’s say I have a WAF ACL with three AWS Managed Rule groups enabled and there are a million web requests every month.
Then my price is:
- WAF ACL - $5.00
- Rules - 3 * $1.00 = $3.00
- Charge for AWS managed rule groups - $0.00
- Requests - $0.60 (for million requests)
My total WAF cost is $8.60 per month.
Now I enable Bot Control. Additional costs are:
- Bot Control Subscription fee - $10.00
- Bot Control Request fee - $1.00 (for million requests)
Now my total WAF cost with Bot Control enabled is $19.60 per month.
This is a screenshot from Cost Explorer dashboard for AWS WAF.
You will have a hard time finetuning bot control logic if there are a lot of QA checks and automation done on dev/staging environments.
In the staging environment, I saw a lot of requests from Postman, Puppeteer (Headless Chrome), curl requests, etc. If I fine-tune the rules to ignore these requests, then these rules would also ignore them in production (which fails the reason we have bot control in the first place).
Also, these environments most probably will not have callbacks or webhooks from other servers or clients that are present in prod (if any).
Because of these two reasons, finetuning bot control rules in the dev/staging environment didn’t work.
So, I enabled it on production in “Count” mode and then finetuned the rules.
Let’s say you were considering bot control to block bot traffic to public endpoints (like login, comment, etc) that can be abused. Then default bot control rules either blocks entire bot traffic or don’t block it at all (if it doesn’t consider it as bot traffic). You will need custom rules over the labels to make sure you block bot traffic and not genuine traffic.
KnownBotDataCenter can help block the requests originating from cloud providers and other bot data centers. But you need to take a call if it’s bot/crawler traffic from those IPs or some valid webhook callbacks from them.
To bypass AWS bot detection, all one needs to do is: use a valid User-Agent (of a browser) and send requests from non-cloud provider IPs (say from your mobile internet with a non-static public IP).
By this you would bypass any complex bot traffic prevention setup on AWS WAF.
NOTE: This is based on my observations only. There could also be other unknown factors that AWS use to detect bot traffic which I didn’t encounter. The bot control logic might be enhanced by AWS anytime without notice as well.
Bot Control is an interesting WAF feature that tries to detect bot traffic. This along with other web ACL features allows the creation of complex rules to handle bot traffic with custom actions (like block, ratelimit, or captcha).
You should consider the bot control feature if:
- you already have AWS WAF enabled on your websites
- you want simple bot control detection capability and take custom actions
- you want to block some web bots and crawlers but not all
You shouldn’t consider the bot control feature if:
- you want 100% prevention of bot traffic
- you want a solution to prevent API abuse
- you are concerned about your existing AWS WAF bill
Are you still in a dilemma if you should or shouldn’t use Bot Control?
Try answering this question: Would you like to double your AWS WAF bill for a feature that tries to block lesser than X% of bot traffic reaching your websites (given that the protection can be bypassed)?
Replace X with whatever percent you see in your production.
I wish I knew these earlier about the bot control feature.