A few tricks up my sleeves – htaccess style

Posted on April 21st, 2008 by Donace in Webmaster
Simple example graph of Web traffic at Wikipedia.org in December 2004

It has been a few months and traffic has been slowly flowing towards the site and with all this good traffic comes the bad. The spammers, the hackers and all miscreants.

To battle this I put in some anti-spam measures a few honey traps for scrapers and askimet to get rid of comment spam. Yet as always some battle through.

One other way I have been trying to battle these miscreants is via my good old .htaccess and below are a few tricks I use. (most if not all credit for these goes to Pershiable)

My God! What bad spelling

One of the ways vulnerabilities on sites based on wordpress and joomla are targeted is via manipulating urls. These urls as usually based on existing urls with a few ‘nice’ alterations. So one way to battle this is to use this little trick.

1
2
3
4
#Spelling Correction
<IfModule mod_speling.c>
CheckSpelling On
</IfModule>

(yes Ironically the mod is call speling!)

What this will do is check the spelling for key parts of the url and correct them. It a small piece of code and very useful as it has the added benefit in increasing the usability of your site!

Java me not

A lot of offline scrapers and miscreants are coded in java, and I have yet to see a java based browser. So adding this little into your htaccess files will keep the java clients at bay.

1
2
3
4
5
6
7
# Block Java/1.0
SetEnvIfNoCase User-Agent "Java/1.0" keep_out
<Limit GET POST>
order allow,deny
allow from all
deny from env=keep_out
</Limit>

Your from where?

The power of the internet is in its anonymity and for those friends of ours who cause us grief usually like to use proxies to keep themselves at arms length.

1
2
3
4
5
6
7
8
9
10
11
# block proxy servers from site access
RewriteEngine on
RewriteCond %{HTTP:VIA}                 !^$ [OR]
RewriteCond %{HTTP:FORWARDED}           !^$ [OR]
RewriteCond %{HTTP:USERAGENT_VIA}       !^$ [OR]
RewriteCond %{HTTP:X_FORWARDED_FOR}     !^$ [OR]
RewriteCond %{HTTP:PROXY_CONNECTION}    !^$ [OR]
RewriteCond %{HTTP:XPROXY_CONNECTION}   !^$ [OR]
RewriteCond %{HTTP:HTTP_PC_REMOTE_ADDR} !^$ [OR]
RewriteCond %{HTTP:HTTP_CLIENT_IP}      !^$
RewriteRule ^(.*)$ - [F]

This little piece of code will prevent proxies from coming to your site. Though on thing to remember is that some genuine users may also be using proxies so you have to ask is it worth the tradeoff?

Don’t change that!

As mentioned earlier url manipulation is one of the key methods of use, and this is what pershiable has come up with to combat it. A nice piece code that stops them from doing it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# 2G Blacklist from Perishable Press
<IfModule mod_alias.c>
redirectmatch 403 .inc
redirectmatch 403 alt=
redirectmatch 403 http://
redirectmatch 403 menu.php
redirectmatch 403 main.php
redirectmatch 403 file.php
redirectmatch 403 home.php
redirectmatch 403 view.php
redirectmatch 403 about.php
redirectmatch 403 order.php
redirectmatch 403 errors.php
redirectmatch 403 config.php
redirectmatch 403 button.php
redirectmatch 403 middle.php
redirectmatch 403 threads.php
redirectmatch 403 contact.php
redirectmatch 403 display.cgi
redirectmatch 403 display.php
redirectmatch 403 include.php
redirectmatch 403 register.php
redirectmatch 403 db_connect.php
redirectmatch 403 doeditconfig.php
redirectmatch 403 send_reminders.php
redirectmatch 403 admin_db_utilities.php
redirectmatch 403 admin.webring.docs.php
redirectmatch 403 keds.lpti
redirectmatch 403 r.verees
redirectmatch 403 pictureofmyself
redirectmatch 403 remoteFile
redirectmatch 403 mybabyboy
redirectmatch 403 mariostar
redirectmatch 403 zaperyan
redirectmatch 403 babyboy
redirectmatch 403 aboutme
redirectmatch 403 xAou6
redirectmatch 403 qymux
</IfModule>

Hey whats this string for?

Now there are some clever people out there and the daddy of url manipulation is string manipulation and this little chunk would help you prevent that.

1
2
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})

Run…its the Rouge Agents

This is a HUGE list by Perishable which contains a number of bad agents and blocking these rouges will help keep you alive!

1
2
3
4
5
6
# Ultimate htaccess Blacklist 2 from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston Project|BravoBrian SpiderEngine MarcoPolo|Bot mailto:craftbot@yahoo.com|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent Internet ToolPak|Custo|cyberalert|DAS|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo Pump|DISCoFinder|Download Demon|Download Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx.net|Email Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites Sweeper|Fetch|FEZhead|FileHound|FlashGet WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image Stripper|Image Sucker|imagefetch|IncyWincy|Indy*Library|Indy Library|informant|Ingelin|InterGET|Internet Ninja|InternetLinkagent|Internet Ninja|InternetSeer.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC Web Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac Finder|Mag-Net|Mass Downloader|MCspider|Memo|Microsoft.URL|MIDown tool|Mirror|Missigua Locator|Mister PiX|MMMtoCrawl/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net Vampire|NetZIP|NetZip Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline Explorer|Offline Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms.it|Second Street Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web Downloader|w3mir|Web Data Extractor|Web Image Collector|Web Sucker|Wweb|WebAuto|WebBandit|web.by.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website eXtractor|Website Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]

Get back in there

Now all these protections are bound to have a 1 in a billion issue where a actual reader accidentally does the wrong thing. This snippet will shove them right back on your homepage so they don’t get lost.

1
2
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [FL]

aaaaaand thats all folks! all my security related .htaccess tricks for now, remember before testing them back up your existing .htaccess and don’t try it on a live server first!

Note for those of you who want the quick option I have attached a custom htaccess to this post with all the security options here, as well as a few performance tweaks and a HUGE list of bad ips blocked.

All you have to do is upload it to your server and rename it from ‘htaccess.txt’ to ‘.htaccess’ (note add the dot/fullstop)

Right click / Save as Here

Enjoy!

Popularity: 11% [?]

No related posts.

11 Comments

  • At 2008.04.24 10:46, David said:

    I’ve tried the .htaccess code for blocking proxy sites. It didn’t seem to work for proxylord.com . Then it could be a delay or the way the servers are set up with Go Daddy.

    Thanks for the tips and tricks!

    David

    • At 2008.04.25 04:28, Donace said:

      To tell you he truth i didn’t extensively test the proxy script as alot of my traffic comes via proxy (a lot of 3rd /developing world country isp’s use them.)

      I will though test it out on my test site and let you know how to fix that issue.

    • At 2008.06.11 13:22, tink said:

      I’m in the process of replacing a site and I’m working on the htaccess file to handle the redirects from the old site as well as a couple other things like pointing to custom 404 error pages.
      Everything you cover is what I want to include in my htaccess file as well hoever, you mention that you are focussed on wordpress and joomala based sites with this htaccess file configuration. Are there any caveats for regular html sites like mine?

      For example:
      I assume the following would be switched from .php to .html
      # Send all blocked request to homepage with 403 Forbidden error!
      RewriteRule ^(.*)$ index.php [FL]
      #changed to
      RewriteRule ^(.*)$ index.html FL]

      Is there anything else?
      I’m a designer not a code jock so while I follow things like this I can find myself off track after the fact and I don’t want to mess up the htaccess file.

      One mess up I did as starters was write:
      IndexIgnore *
      # instead of
      Options -Indexes

      • At 2008.06.12 03:12, Donace said:

        hey tink; glad you found the post useful. I’m not actually a ‘designer’ or a ‘code jock’ this is just stuff I have learnt along way.

        Though I would assume that your assumption in changing it to .html as opposed to .php would work. Here though i would stress to try it in a test environment first just to make sure and keep an eye on the error logs after the change.

        I would also urge you to take a look at this site: http://perishablepress.com/
        As he has helped me a lot in the matter.

      • At 2008.06.13 13:58, tink said:

        Cool, thanks for the link. I’m on may way…
        on my way through google

        • At 2008.08.29 08:55, Want a cookie? said:

          [...] give it my best shot. If you want to learn more about htaccess have a quick snoop on one of my older articles and a concise write up by Jeff over at perishable [...]

          • At 2008.11.17 13:49, AskApache said:

            Nice writeup Donace, it’s refreshing to see a good .htaccess article like this.. keep it up!

            • At 2008.11.17 14:41, Donace said:

              Thanks man but I cannot take all the credit here. The ‘code magic’ here and in the following article ‘htaccess reviewed’ is based on the code provided by you and Jeff over at perishable press…so YOU guyz keep it up!

            • At 2009.01.13 21:57, Nihar said:

              Donace, this post has stumped me!
              First i installed wp-comment-stopper plugin. It was working fine but my readers using IE where unable to do so. then i switched to Peter’s Anti-spam plugin. but, it was not properly working because i am using wp-super-cache. Today morning, i installed WP-Spam-Free. But, after reading this post. I think i should disable it for a while and put all your entries and see whether spam is reduced or not.

              Thank you very much. This post is deserved to be in my Friday Link Party

              • At 2009.01.14 01:25, Donace said:

                WP-spam free is very good except it has the possibility of ‘false positives’ hence why I ignored it for my setup.
                I suggest you backup your htaccess first and apply the tweaks in conjuncture to the ones on htaccess reviewed, as one two have better variation in the later post.

                Also check out perishable press’s 3G blacklist; and let me know how it goes.

                • At 2009.08.29 04:55, How to stop comment spam | The Nexus said:

                  [...] of repeating what i’ve said elsewhere I urge you to read A few tricks up my sleeves – htaccess style and htaccess reviewed in which I have detailed a number of spam prevention techniques. (the later [...]

                  (A must)
                  (Another Must but dont worry will not be published)

                  Archives

                  Full Archive

                  Featured Links

                  Tag Cloud

                  .htaccess adgitize Alexa Internet automation Backlink Backlinks Blog bot Bots code competitons Contest copyright entrecard Firefox Google Google Page Rank How to howto Law link building Link Love links news Optimization PageRank PHP plugin Programming Promotion Rants of a loony toon rapidshare Search Engines Security SEO Site update Site updates Spammers TheDuke traffic tutorial updates Weblogs Webmaster Web traffic