One of our colo clients was complaining that his site was slow.
Took a look, and although load was only slightly above normal, he was doing a substantial amount of traffic throughput.
As he has multiple *busy* sites on his server, it was easier to take a look at iftop to see what was leeching the most traffic.
I could immediately see that he had a couple of spiders indexing one of his sites.
Normally this isn’t a huge issue, as they usually place nicely.
In this case, it seemed to be multiple connections from spiders.
Our robots.txt looked something like this –
cat robots.txt
User-agent: *
Disallow: /members/
Disallow: /activity/
Disallow: /ko/
Disallow: /fr/
Disallow: /zh/
Disallow: /th/
Disallow: /vi/
Disallow: /th/
Disallow: /es/
Disallow: /ja/
Disallow: /it/
Disallow: /ru/
Disallow: /ar/
Disallow: /fi/
Disallow: /pl/
Disallow: /nl/
Disallow: /pt/
Disallow: /he/
Disallow: /no/
Disallow: /sv/
Disallow: /zh-tw/
Disallow: /cs/
Disallow: /de/
Disallow: /uk/
Disallow: /el/
Disallow: /tr/
A quick check of site logs filtered by spiders showed that the majority of the traffic was coming from Bing / MSN.
There were at least 10 – 15 simultaneous spiders indexing. Not only that, but Bing / MSN was busy indexing all the lovely pages that we’d explicitly excluded in the sites robots.txt file.
*and* it was downloading the robots.txt file, then totally ignoring it.
207.46.195.241 - - [18/May/2012:03:15:04 +0000] "GET /robots.txt HTTP/1.1" 404 280 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:03:15:57 +0000] "GET / HTTP/1.1" 500 975 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:03:17:29 +0000] "GET / HTTP/1.0" 500 1534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.241 - - [18/May/2012:04:17:30 +0000] "GET / HTTP/1.0" 500 1534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.195.234 - - [18/May/2012:12:57:16 +0000] "GET /no/members/kxtjanio/activity/ HTTP/1.1" 200 14555 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:16 +0000] "GET /tr/members/poluden/activity/groups/?acpage=14 HTTP/1.1" 200 17927 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:16 +0000] "GET /zh-tw/members/bluezat/forums/ HTTP/1.1" 200 14633 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:18 +0000] "GET /ar/members/filozofem/activity/1643 HTTP/1.1" 200 12348 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:18 +0000] "GET /pt/members/chwacker/activity/groups/ HTTP/1.1" 200 18675 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:20 +0000] "GET /ru/members/maklare/points/points/ HTTP/1.1" 200 14945 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:21 +0000] "GET /pl/members/ken0115/activity/groups/?acpage=7 HTTP/1.1" 200 17936 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:21 +0000] "GET /zh/members/halilfree82/activity/favorites/ HTTP/1.1" 200 15677 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:23 +0000] "GET /zh/members/elwe/activity/mentions/ HTTP/1.1" 200 15670 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:23 +0000] "GET /no/members/afsaneh/activity/friends/?acpage=5 HTTP/1.1" 200 17101 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:25 +0000] "GET /tr/members/ahuy/activity/friends/ HTTP/1.1" 200 17458 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:25 +0000] "GET /members/zarus/activity/groups/?acpage=3 HTTP/1.1" 200 18131 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.196 - - [18/May/2012:12:57:26 +0000] "GET /ko/members/daniel/activity/friends/?acpage=4 HTTP/1.1" 200 18598 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:57:30 +0000] "GET /fr/members/poluden/activity/ HTTP/1.1" 200 15480 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:31 +0000] "GET /fr/activity/?acpage=14 HTTP/1.1" 200 17299 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:31 +0000] "GET /zh/members/zarus/activity/groups/blog.letsfx.com?acpage=1 HTTP/1.1" 200 19401 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:33 +0000] "GET /th/members/chwacker/activity/friends/?acpage=2 HTTP/1.1" 200 16651 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.198 - - [18/May/2012:12:57:34 +0000] "GET /vi/members/natvp/activity/groups/?acpage=5 HTTP/1.1" 200 20382 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.198 - - [18/May/2012:12:57:34 +0000] "GET /th/members/helen/activity/groups/?acpage=3 HTTP/1.1" 200 18099 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:35 +0000] "GET /groups/bep-study-buddies/activity/2971/ HTTP/1.1" 200 13887 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:35 +0000] "GET /ko/members/ahmedv/blogs/ HTTP/1.1" 200 15091 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:42 +0000] "GET /fr/members/cluadiomasu/points/ HTTP/1.1" 200 14172 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:42 +0000] "GET /ja/members/shengmao8620/groups/ HTTP/1.1" 200 15219 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:45 +0000] "GET /es/blog/tag/presentations-2/ HTTP/1.1" 200 17037 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:45 +0000] "GET /it/members/phuong.vo/activity/groups/ HTTP/1.1" 200 18216 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:48 +0000] "GET /fi/members/stella85/activity/1086 HTTP/1.1" 200 12350 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:48 +0000] "GET /ru/members/stella85/activity/groups/?acpage=12 HTTP/1.1" 200 21159 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:49 +0000] "GET /zh/members/kris/activity/friends/ HTTP/1.1" 200 18838 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:50 +0000] "GET /fr/members/cheryl/activity/groups/?acpage=14 HTTP/1.1" 200 19312 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:51 +0000] "GET /vi/members/vecttra/activity/groups/?acpage=4 HTTP/1.1" 200 17180 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:53 +0000] "GET /fi/members/?s=Intermediate&upage=1 HTTP/1.1" 200 13955 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.109.200 - - [18/May/2012:12:57:53 +0000] "GET /ar/members/test-user/activity/groups/?acpage=9 HTTP/1.1" 200 17499 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:53 +0000] "GET /members/admin/friends HTTP/1.1" 200 14760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:57:53 +0000] "GET /members/lasso/activity/groups/?acpage=3 HTTP/1.1" 200 18101 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:57:58 +0000] "GET /fi/members/hoangdtv7986/friends HTTP/1.1" 200 13741 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:01 +0000] "GET /ru/members/nikoletth/points/ HTTP/1.1" 200 14970 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:01 +0000] "GET /ko/members/muratoncel3438/activity/ HTTP/1.1" 200 16100 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:01 +0000] "GET /fr/members/jack/activity/ HTTP/1.1" 200 15425 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:03 +0000] "GET /pl/members/filozofem/blogs/ HTTP/1.1" 200 13573 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:08 +0000] "GET /nl/members/mohamedyahia/activity/friends/?acpage=10 HTTP/1.1" 200 17835 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:08 +0000] "GET /pt/members/augert/activity/favorites/ HTTP/1.1" 200 15187 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:10 +0000] "GET /ko/members/chima78/activity/favorites/ HTTP/1.1" 200 15851 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.151 - - [18/May/2012:12:58:14 +0000] "GET /he/members/wildthing/activity/friends/ HTTP/1.1" 200 16883 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:21 +0000] "GET /no/members/filiz/activity/friends/?acpage=7 HTTP/1.1" 200 16440 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:30 +0000] "GET /members/david/activity/groups/?acpage=8 HTTP/1.1" 200 17208 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:30 +0000] "GET /pt/members/moniques/activity/groups/?acpage=7 HTTP/1.1" 200 18934 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:31 +0000] "GET /sv/groups/bep-study-buddies/members/?mlpage=7 HTTP/1.1" 200 14784 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:30 +0000] "GET /zh-tw/members/marcos/activity/friends/?acpage=2 HTTP/1.1" 200 18219 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:31 +0000] "GET /ja/members/sluconi/activity/groups/?acpage=3 HTTP/1.1" 200 21509 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.147 - - [18/May/2012:12:58:32 +0000] "GET /cs/members/joyull/activity/favorites/ HTTP/1.1" 200 14216 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.108.69 - - [18/May/2012:12:58:32 +0000] "GET /zh-tw/members/moniques/activity/friends/?acpage=15 HTTP/1.1" 200 18509 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.17.147 - - [18/May/2012:12:58:32 +0000] "GET /pt/members/luquejee/activity/groups/?acpage=12 HTTP/1.1" 200 18612 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.52.110.17 - - [18/May/2012:12:58:35 +0000] "GET /fi/activity/?acpage=66 HTTP/1.1" 200 16466 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.51 - - [18/May/2012:12:58:58 +0000] "GET /members/filozofem/profile/ HTTP/1.1" 200 13354 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
^C
The ranges in use by bing appear to be 207.46.*, and 65.52.*, 157.55.17.*
A quick check to see who owns those ranges confirms that yes, it is indeed the evil empire.
NetRange: 65.52.0.0 - 65.55.255.255
CIDR: 65.52.0.0/14
OriginAS:
NetName: MICROSOFT-1BLK
NetHandle: NET-65-52-0-0-1
Parent: NET-65-0-0-0-0
NetType: Direct Assignment
RegDate: 2001-02-14
Updated: 2012-03-20
Ref: http://whois.arin.net/rest/net/NET-65-52-0-0-1
NetRange: 207.46.0.0 - 207.46.255.255
CIDR: 207.46.0.0/16
OriginAS:
NetName: MICROSOFT-GLOBAL-NET
NetHandle: NET-207-46-0-0-1
Parent: NET-207-0-0-0-0
NetType: Direct Assignment
RegDate: 1997-03-31
Updated: 2004-12-09
Ref: http://whois.arin.net/rest/net/NET-207-46-0-0-1
NetRange: 157.54.0.0 - 157.60.255.255
CIDR: 157.60.0.0/16, 157.56.0.0/14, 157.54.0.0/15
OriginAS: AS8075
NetName: MSFT-GFS
NetHandle: NET-157-54-0-0-1
Parent: NET-157-0-0-0-0
NetType: Direct Assignment
Comment: Abuse complaints will only be responded to if sent to abuse@microsoft.com and abuse@msn.com.
RegDate: 1994-04-28
Updated: 2010-08-19
Ref: http://whois.arin.net/rest/net/NET-157-54-0-0-1
As you can see, they do have an abuse email contact. Which bounces.
Need I say anything more?
As I could readily identify that they were completely ignoring the file, even *after* downloading it from logs (eg see a request for the robots.txt file, then more requests for folders explicitly denied inside the robots.txt file! from the same ip), I decided to take some action to block them.
The following will block MSN Bot (Bing) from hammering a site.
#Block 207.46.*
iptables -A INPUT -s 65.52.0.0/14 -j DROP
#Block 65.52.*
iptables -A INPUT -s 207.46.0.0/14 -j DROP
#Block 157.55.17.*
iptables -A INPUT -s 157.55.17.0/24 -j DROP
Note that the 3rd range actually goes from 157.54.0.0 – 157.60.255.255
I wasn’t actually seeing any evilness from the 157.56 – 157.60.* range, so I’ve ignored them. Letting some Bing stuff through is a good idea (assuming they can behave themselves), as we don’t want to lose SEO goodness on one of the less^H popular search engines.
A quick tail of the logs later, and I could see that the multitude of bandwidth leeching MSN / Bing bots were gone. Plus, the site loaded much much faster.
A quick google (haha), for MSN / BING spiders doing the same thing to others revealed that we aren’t alone, and a number of people complain about exactly the same issue.
According to Bing, they do respect the protocol.
My own findings, and a check of others findings show that they do not.
This search might be of interest –
http://www.google.com/search?&q=bing+ignoring+robots.txt
We’re not the only ones –
http://techie-buzz.com/microsoft/bing-crawler-msnbot-stupid.html
http://www.semwisdom.com/blog/msnbot-stupid-plain-evil
As we’ve verified that the ip ranges in use by the crawlers are indeed owned by Microsoft, its pretty evident that they’re lying.
C’est la vie.
Archives
- November 2024
- November 2019
- October 2019
- August 2019
- April 2019
- February 2017
- September 2016
- June 2016
- May 2016
- September 2015
- August 2015
- June 2015
- April 2015
- December 2014
- October 2014
- September 2014
- July 2014
- June 2014
- April 2014
- October 2013
- July 2013
- May 2013
- April 2013
- March 2013
- January 2013
- December 2012
- October 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- December 2011
- November 2011
- October 2011
- September 2011
- July 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
Categories
- Apple
- Arcade Machines
- Badges
- BMW
- China Related
- Cool Hunting
- Exploits
- Firmware
- Food
- General Talk
- government
- IP Cam
- iPhone
- Lasers
- legislation
- MODx
- MySQL
- notice
- qmail
- requirements
- Reviews
- Service Issues
- Tao Bao
- Technical Mumbo Jumbo
- Things that will get me censored
- Travel
- Uncategorized
- Useful Info