How I Blocked Bad Bots in Apache and Saved My Server

For years, my website suffered from relentless web crawlers—those aggressive bots that comb through sites collecting data. While some bots like Googlebot and Bingbot are helpful and necessary for SEO, others like AhrefsBot, SemrushBot, and MJ12bot are infamously resource-hungry. These “bad bots” were hammering my Ubuntu server, causing CPU usage to spike above 100% and crashing my site regularly.

After much trial and error, I finally implemented a server-level solution using Apache that effectively blocked these bots. In this blog post, I’ll walk you through exactly how I did it, so you can do the same and protect your site’s resources.

Why You Should Block Bad Bots

Not all bots are created equal. Here’s why blocking the bad ones matters:

  • 🚨 High CPU usage: Many aggressive crawlers ignore your robots.txt file and make excessive requests.
  • 🛑 Downtime risks: Too much bot traffic can overwhelm your web server, leading to slow load times or complete outages.
  • 💰 Wasted resources: On cloud-based VPS or dedicated servers, overuse can increase your costs.
  • 📉 No SEO value: Most bad bots do not contribute to search engine rankings or visibility.

Step-by-Step: How to Block Bad Bots in Apache (Ubuntu Server)

1. Create a New Apache Config File to Identify Good and Bad Bots

Create a new file called blockbots.conf in your Apache configuration directory:

sudo nano /etc/apache2/conf-available/blockbots.conf

Paste in the following configuration:

<IfModule mod_setenvif.c>
    # Allow only good bots
    SetEnvIfNoCase User-Agent "Googlebot" good_bot
    SetEnvIfNoCase User-Agent "Bingbot" good_bot

    # Block known bad bots
    SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
    SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
    SetEnvIfNoCase User-Agent "MJ12bot" bad_bot
    SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
    SetEnvIfNoCase User-Agent "Bytespider" bad_bot
    SetEnvIfNoCase User-Agent "DotBot" bad_bot
    SetEnvIfNoCase User-Agent "PetalBot" bad_bot
    SetEnvIfNoCase User-Agent "crawl" bad_bot
    SetEnvIfNoCase User-Agent "spider" bad_bot
</IfModule>

<Directory "/var/www/html">
    <RequireAll>
        Require all granted
        Require not env bad_bot
    </RequireAll>
</Directory>

2. Modify Your Apache Virtual Host File

Edit your website’s virtual host file. Example:

sudo nano /etc/apache2/sites-available/yourdomain.com.conf

Ensure it includes something like this:

IncludeOptional /etc/apache2/conf-enabled/blockbots.conf

3. Enable the Necessary Apache Modules and Config

Enable required modules and the new config file:

sudo a2enmod authz_core
sudo a2enmod setenvif
sudo a2enconf blockbots

4. Reload Apache to Apply the Changes

Always check your Apache configuration for syntax errors before reloading:

sudo apachectl configtest

If the output is Syntax OK, proceed:

sudo systemctl reload apache2

How to Test If Bad Bots Are Blocked

You can use curl to simulate bot access:

For Example: curl -I -A "AhrefsBot" https://www.storeshoppe.com

If the configuration is working correctly, you should see a 403 Forbidden response instead of 200 OK.

Results: Huge Performance Improvements

After applying this method:

  • My server’s CPU usage dropped drastically
  • No more unexplained spikes in server load
  • Crawling activity is now limited to legitimate bots
  • Website uptime and response speed improved

Final Thoughts

Blocking abusive bots at the Apache level is one of the most effective ways to protect your server from overload. Unlike robots.txt, which polite bots follow, this method enforces restrictions at the web server layer—ensuring compliance whether the bot likes it or not.

If you’re running a website on an Ubuntu server with Apache, I highly recommend you implement this solution. Your server (and your visitors) will thank you.

Related posts

How to Use Non-Rechargeable Batteries Safely & Efficiently

SEO vs GEO in the AI Era: Why Website Speed Now Rules Search

Can Recycled Polyester Rival Bio-Based Fibers in Fashion?