Support

Blog

One of our colo clients was complaining that his site was slow.

Took a look, and although load was only slightly above normal, he was doing a substantial amount of traffic throughput.

As he has multiple *busy* sites on his server, it was easier to take a look at iftop to see what was leeching the most traffic.

I could immediately see that he had a couple of spiders indexing one of his sites.
Normally this isn’t a huge issue, as they usually place nicely.

In this case, it seemed to be multiple connections from spiders.
Our robots.txt looked something like this –

cat robots.txt

User-agent: *
Disallow: /members/
Disallow: /activity/
Disallow: /ko/
Disallow: /fr/
Disallow: /zh/
Disallow: /th/
Disallow: /vi/
Disallow: /th/
Disallow: /es/
Disallow: /ja/
Disallow: /it/
Disallow: /ru/
Disallow: /ar/
Disallow: /fi/
Disallow: /pl/
Disallow: /nl/
Disallow: /pt/
Disallow: /he/
Disallow: /no/
Disallow: /sv/
Disallow: /zh-tw/
Disallow: /cs/
Disallow: /de/
Disallow: /uk/
Disallow: /el/
Disallow: /tr/

A quick check of site logs filtered by spiders showed that the majority of the traffic was coming from Bing / MSN.

There were at least 10 – 15 simultaneous spiders indexing. Not only that, but Bing / MSN was busy indexing all the lovely pages that we’d explicitly excluded in the sites robots.txt file.

*and* it was downloading the robots.txt file, then totally ignoring it.

207.46.195.241 - - [18/May/2012:03:15:04 +0000] "GET /robots.txt HTTP/1.1" 404 280 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:03:15:57 +0000] "GET / HTTP/1.1" 500 975 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:03:17:29 +0000] "GET / HTTP/1.0" 500 1534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:04:17:30 +0000] "GET / HTTP/1.0" 500 1534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.234 - - [18/May/2012:12:57:16 +0000] "GET /no/members/kxtjanio/activity/ HTTP/1.1" 200 14555 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:16 +0000] "GET /tr/members/poluden/activity/groups/?acpage=14 HTTP/1.1" 200 17927 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:16 +0000] "GET /zh-tw/members/bluezat/forums/ HTTP/1.1" 200 14633 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:18 +0000] "GET /ar/members/filozofem/activity/1643 HTTP/1.1" 200 12348 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:18 +0000] "GET /pt/members/chwacker/activity/groups/ HTTP/1.1" 200 18675 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:20 +0000] "GET /ru/members/maklare/points/points/ HTTP/1.1" 200 14945 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:21 +0000] "GET /pl/members/ken0115/activity/groups/?acpage=7 HTTP/1.1" 200 17936 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:21 +0000] "GET /zh/members/halilfree82/activity/favorites/ HTTP/1.1" 200 15677 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:23 +0000] "GET /zh/members/elwe/activity/mentions/ HTTP/1.1" 200 15670 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:23 +0000] "GET /no/members/afsaneh/activity/friends/?acpage=5 HTTP/1.1" 200 17101 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:25 +0000] "GET /tr/members/ahuy/activity/friends/ HTTP/1.1" 200 17458 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:25 +0000] "GET /members/zarus/activity/groups/?acpage=3 HTTP/1.1" 200 18131 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:26 +0000] "GET /ko/members/daniel/activity/friends/?acpage=4 HTTP/1.1" 200 18598 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:57:30 +0000] "GET /fr/members/poluden/activity/ HTTP/1.1" 200 15480 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:31 +0000] "GET /fr/activity/?acpage=14 HTTP/1.1" 200 17299 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:31 +0000] "GET /zh/members/zarus/activity/groups/blog.letsfx.com?acpage=1 HTTP/1.1" 200 19401 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:33 +0000] "GET /th/members/chwacker/activity/friends/?acpage=2 HTTP/1.1" 200 16651 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.198 - - [18/May/2012:12:57:34 +0000] "GET /vi/members/natvp/activity/groups/?acpage=5 HTTP/1.1" 200 20382 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.198 - - [18/May/2012:12:57:34 +0000] "GET /th/members/helen/activity/groups/?acpage=3 HTTP/1.1" 200 18099 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:35 +0000] "GET /groups/bep-study-buddies/activity/2971/ HTTP/1.1" 200 13887 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:35 +0000] "GET /ko/members/ahmedv/blogs/ HTTP/1.1" 200 15091 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:42 +0000] "GET /fr/members/cluadiomasu/points/ HTTP/1.1" 200 14172 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:42 +0000] "GET /ja/members/shengmao8620/groups/ HTTP/1.1" 200 15219 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:45 +0000] "GET /es/blog/tag/presentations-2/ HTTP/1.1" 200 17037 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:45 +0000] "GET /it/members/phuong.vo/activity/groups/ HTTP/1.1" 200 18216 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:48 +0000] "GET /fi/members/stella85/activity/1086 HTTP/1.1" 200 12350 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:48 +0000] "GET /ru/members/stella85/activity/groups/?acpage=12 HTTP/1.1" 200 21159 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:49 +0000] "GET /zh/members/kris/activity/friends/ HTTP/1.1" 200 18838 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:50 +0000] "GET /fr/members/cheryl/activity/groups/?acpage=14 HTTP/1.1" 200 19312 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:51 +0000] "GET /vi/members/vecttra/activity/groups/?acpage=4 HTTP/1.1" 200 17180 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:53 +0000] "GET /fi/members/?s=Intermediate&upage=1 HTTP/1.1" 200 13955 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:53 +0000] "GET /ar/members/test-user/activity/groups/?acpage=9 HTTP/1.1" 200 17499 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:53 +0000] "GET /members/admin/friends HTTP/1.1" 200 14760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:53 +0000] "GET /members/lasso/activity/groups/?acpage=3 HTTP/1.1" 200 18101 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:57:58 +0000] "GET /fi/members/hoangdtv7986/friends HTTP/1.1" 200 13741 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:01 +0000] "GET /ru/members/nikoletth/points/ HTTP/1.1" 200 14970 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:01 +0000] "GET /ko/members/muratoncel3438/activity/ HTTP/1.1" 200 16100 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:01 +0000] "GET /fr/members/jack/activity/ HTTP/1.1" 200 15425 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:03 +0000] "GET /pl/members/filozofem/blogs/ HTTP/1.1" 200 13573 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:08 +0000] "GET /nl/members/mohamedyahia/activity/friends/?acpage=10 HTTP/1.1" 200 17835 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:08 +0000] "GET /pt/members/augert/activity/favorites/ HTTP/1.1" 200 15187 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:10 +0000] "GET /ko/members/chima78/activity/favorites/ HTTP/1.1" 200 15851 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:14 +0000] "GET /he/members/wildthing/activity/friends/ HTTP/1.1" 200 16883 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:21 +0000] "GET /no/members/filiz/activity/friends/?acpage=7 HTTP/1.1" 200 16440 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:30 +0000] "GET /members/david/activity/groups/?acpage=8 HTTP/1.1" 200 17208 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:30 +0000] "GET /pt/members/moniques/activity/groups/?acpage=7 HTTP/1.1" 200 18934 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:31 +0000] "GET /sv/groups/bep-study-buddies/members/?mlpage=7 HTTP/1.1" 200 14784 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:30 +0000] "GET /zh-tw/members/marcos/activity/friends/?acpage=2 HTTP/1.1" 200 18219 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:31 +0000] "GET /ja/members/sluconi/activity/groups/?acpage=3 HTTP/1.1" 200 21509 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.147 - - [18/May/2012:12:58:32 +0000] "GET /cs/members/joyull/activity/favorites/ HTTP/1.1" 200 14216 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:32 +0000] "GET /zh-tw/members/moniques/activity/friends/?acpage=15 HTTP/1.1" 200 18509 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.147 - - [18/May/2012:12:58:32 +0000] "GET /pt/members/luquejee/activity/groups/?acpage=12 HTTP/1.1" 200 18612 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:35 +0000] "GET /fi/activity/?acpage=66 HTTP/1.1" 200 16466 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:58 +0000] "GET /members/filozofem/profile/ HTTP/1.1" 200 13354 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
^C

The ranges in use by bing appear to be 207.46.*, and 65.52.*, 157.55.17.*
A quick check to see who owns those ranges confirms that yes, it is indeed the evil empire.


NetRange: 65.52.0.0 - 65.55.255.255
CIDR: 65.52.0.0/14
OriginAS:
NetName: MICROSOFT-1BLK
NetHandle: NET-65-52-0-0-1
Parent: NET-65-0-0-0-0
NetType: Direct Assignment
RegDate: 2001-02-14
Updated: 2012-03-20
Ref: http://whois.arin.net/rest/net/NET-65-52-0-0-1

NetRange: 207.46.0.0 - 207.46.255.255
CIDR: 207.46.0.0/16
OriginAS:
NetName: MICROSOFT-GLOBAL-NET
NetHandle: NET-207-46-0-0-1
Parent: NET-207-0-0-0-0
NetType: Direct Assignment
RegDate: 1997-03-31
Updated: 2004-12-09
Ref: http://whois.arin.net/rest/net/NET-207-46-0-0-1

NetRange: 157.54.0.0 - 157.60.255.255
CIDR: 157.60.0.0/16, 157.56.0.0/14, 157.54.0.0/15
OriginAS: AS8075
NetName: MSFT-GFS
NetHandle: NET-157-54-0-0-1
Parent: NET-157-0-0-0-0
NetType: Direct Assignment
Comment: Abuse complaints will only be responded to if sent to abuse@microsoft.com and abuse@msn.com.
RegDate: 1994-04-28
Updated: 2010-08-19
Ref: http://whois.arin.net/rest/net/NET-157-54-0-0-1

As you can see, they do have an abuse email contact. Which bounces.
Need I say anything more?

As I could readily identify that they were completely ignoring the file, even *after* downloading it from logs (eg see a request for the robots.txt file, then more requests for folders explicitly denied inside the robots.txt file! from the same ip), I decided to take some action to block them.

The following will block MSN Bot (Bing) from hammering a site.

#Block 207.46.*
iptables -A INPUT -s 65.52.0.0/14 -j DROP
#Block 65.52.*
iptables -A INPUT -s 207.46.0.0/14 -j DROP
#Block 157.55.17.*
iptables -A INPUT -s 157.55.17.0/24 -j DROP

Note that the 3rd range actually goes from 157.54.0.0 – 157.60.255.255
I wasn’t actually seeing any evilness from the 157.56 – 157.60.* range, so I’ve ignored them. Letting some Bing stuff through is a good idea (assuming they can behave themselves), as we don’t want to lose SEO goodness on one of the less^H popular search engines.

A quick tail of the logs later, and I could see that the multitude of bandwidth leeching MSN / Bing bots were gone. Plus, the site loaded much much faster.

A quick google (haha), for MSN / BING spiders doing the same thing to others revealed that we aren’t alone, and a number of people complain about exactly the same issue.

According to Bing, they do respect the protocol.

http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx

My own findings, and a check of others findings show that they do not.

This search might be of interest –
http://www.google.com/search?&q=bing+ignoring+robots.txt

We’re not the only ones –
http://techie-buzz.com/microsoft/bing-crawler-msnbot-stupid.html

http://www.semwisdom.com/blog/msnbot-stupid-plain-evil

As we’ve verified that the ip ranges in use by the crawlers are indeed owned by Microsoft, its pretty evident that they’re lying.

C’est la vie.

Archives

Categories

Tags

PHOTOSTREAM