There are good bots and there are bad bots, there are bots which follow common sense and crawl at a speed which does not create excessive load on your server, and then there are bots which act in a greedy manner and hammer your site with just enough speed to bring your server to its knee and yet, escape detection by your firewall. Chances are if you run a big enough website/forum, you have already seen the ugly side of such greedy bots.
I know I have!
I have seen bots coming in from Russian and Chinese search engines, which continuously hammer your site for their search index, which is never going to bring you much if any traffic and I have also seen bots coming in from social media management companies and other “big data” firms, who only care about scouring the web for data they can resell at a premium to large corporation, while giving two hoots about the poor server admin and real visitors of the site!
While thankfully those Russian and Chinese search engines adhere to robot.txt file and adding a simple crawl delay prevents them from hammering your server again.
User-agent: search engine bot name
Crawl-delay: 30 # specifies 30-second timeout
Bots from those SMM and big data firms rarely do that. Instead they keep changing their IPs, until you manage to block their complete range through your firewall. There are also scrapper, spammer and all sort of other nasty bots, which act in a similar manner and cause your server load to shoot up.
Due to this, I have learned to keep an eye on the server load and whenever I see it shooting up for no apparent reason, I dig into server logs, find IPs that have been requesting large number of pages, run it through various whois tools available to see whether it is an end user IP or it belongs to a DC. And if it belongs to a DC, I normally block it, without a second thought.
While for the technically proficient server admins, taking care of such bots would be as easy as whipping out a script to detect and block their user agent, for non-technical administrators like me, things aren’t that easy, and it is one of primary reasons why I have begun evaluating CloudFlare, which claims to block such bad bots automatically.
To cut the long story short, if you too are experiencing excessive load on your server with no real jump in end user numbers, chances are, you too are a victim of these bad bots. So make sure to go through your server logs (for CPanel users, you can also click on latest visitors in CPanel to view latest visitors and their IPs and user agents) and see whether you too are being hammered by a bot, before deciding to upgrade your server or spending money, hiring system administrator to tweak and optimize your server.