For years, my website suffered from relentless web crawlers—those aggressive bots that comb through sites collecting data. While some bots like Googlebot and Bingbot are helpful and necessary for SEO, others like AhrefsBot, SemrushBot, and MJ12bot are infamously resource-hungry. These “bad bots” were hammering my Ubuntu server, causing CPU usage to spike above 100% and crashing my site regularly.
After much trial and error, I finally implemented a server-level solution using Apache that effectively blocked these bots. In this blog post, I’ll walk you through exactly how I did it, so you can do the same and protect your site’s resources.
Why You Should Block Bad Bots
Not all bots are created equal. Here’s why blocking the bad ones matters:
- 🚨 High CPU usage: Many aggressive crawlers ignore your
robots.txtfile and make excessive requests. - 🛑 Downtime risks: Too much bot traffic can overwhelm your web server, leading to slow load times or complete outages.
- 💰 Wasted resources: On cloud-based VPS or dedicated servers, overuse can increase your costs.
- 📉 No SEO value: Most bad bots do not contribute to search engine rankings or visibility.
Step-by-Step: How to Block Bad Bots in Apache (Ubuntu Server)
1. Create a New Apache Config File to Identify Good and Bad Bots
Create a new file called blockbots.conf in your Apache configuration directory:
sudo nano /etc/apache2/conf-available/blockbots.conf
Paste in the following configuration:
<IfModule mod_setenvif.c>
# Allow only good bots
SetEnvIfNoCase User-Agent "Googlebot" good_bot
SetEnvIfNoCase User-Agent "Bingbot" good_bot
# Block known bad bots
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
SetEnvIfNoCase User-Agent "MJ12bot" bad_bot
SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "Bytespider" bad_bot
SetEnvIfNoCase User-Agent "DotBot" bad_bot
SetEnvIfNoCase User-Agent "PetalBot" bad_bot
SetEnvIfNoCase User-Agent "crawl" bad_bot
SetEnvIfNoCase User-Agent "spider" bad_bot
</IfModule>
<Directory "/var/www/html">
<RequireAll>
Require all granted
Require not env bad_bot
</RequireAll>
</Directory>
2. Modify Your Apache Virtual Host File
Edit your website’s virtual host file. Example:
sudo nano /etc/apache2/sites-available/yourdomain.com.conf
Ensure it includes something like this:
IncludeOptional /etc/apache2/conf-enabled/blockbots.conf
3. Enable the Necessary Apache Modules and Config
Enable required modules and the new config file:
sudo a2enmod authz_core
sudo a2enmod setenvif
sudo a2enconf blockbots
4. Reload Apache to Apply the Changes
Always check your Apache configuration for syntax errors before reloading:
sudo apachectl configtest
If the output is Syntax OK, proceed:
sudo systemctl reload apache2
How to Test If Bad Bots Are Blocked
You can use curl to simulate bot access:
For Example: curl -I -A "AhrefsBot" https://www.storeshoppe.com
If the configuration is working correctly, you should see a 403 Forbidden response instead of 200 OK.
Results: Huge Performance Improvements
After applying this method:
- My server’s CPU usage dropped drastically
- No more unexplained spikes in server load
- Crawling activity is now limited to legitimate bots
- Website uptime and response speed improved
Final Thoughts
Blocking abusive bots at the Apache level is one of the most effective ways to protect your server from overload. Unlike robots.txt, which polite bots follow, this method enforces restrictions at the web server layer—ensuring compliance whether the bot likes it or not.
If you’re running a website on an Ubuntu server with Apache, I highly recommend you implement this solution. Your server (and your visitors) will thank you.