OCDProgrammer.com

It's Microsoft's World, and I'm just living in it
View Clarence Klopfstein's profile on LinkedIn

Clarence Klopfstein's Facebook profile

This site is under construction...

Categories

New Comments

Referring Sites


Disclaimer

  • This is MY blog. The views represented here are not in relation to anybody else. Please read my full disclaimer for a more complete disclaimer.

Abusive Spiders - GateKeeper

January 11, 2009 15:39 by ckincincy

image

Chris Blankenship has been on a crusade lately about abusive spiders.  I was interested in some of the fixes he was applying to it, but a few weeks ago I got an email from him about a solution he was developing, ‘GateKeeper’.  I reviewed the code and it all looked good, but he wasn’t ready yet to fully release it into the wild.

It finally got to that point and I installed it on my two DotNetBlogEngine.net blogs. So far I have been really impressed with it.  I’m really interested to see how it affects my overall traffic.  Right now I have four blocked user agents:

baiduspider, larbin, sogou, sosospider.  All of those came from Chris’s recommendation.  Then I immediately got a Slurp violation, though I am going to give them one more failure before I block them.  Chris also has MSN blocked.  A lot of my traffic comes from Live Search, so I’m a little scared to do that.

I did fall on one issue with the solution though.  When I installed it, I had it set to automatically block violators.  Unknown to Chris and I is that Google caches the robots.txt file!  So since they didn’t get my new robots.txt file, they were blocked!  So it is recommended to not turn on the automatic blocking for at least a few days.

Related post from Chris’s site:
The Continued Struggle With Spiders
To catch a spider…
Abusive Web Crawlers
Blocking Bad UserAgents and IP Addresses
The elusive Robots.txt file


Comments are closed