| htaccess overview |
| Monday, 20 October 2008 09:20 |
introduction to .htaccessThis work in constant progress is some collected wisdom, stuff I've learned on the topic of .htaccess hacking, commands I've used successfully in the past, on a variety of server setups, and in most cases still do. You may have to tweak the examples some to get the desired result, though, and a reliable test server is a powerful ally, preferably one with a very similar setup to your "live" server. Okay, to begin...htaccess files are invisible There's a good reason why you won't see .htaccess files on the web; almost every web server in the world is configured to ignore them, by default. Same goes for most operating systems. mainly it's the dot "." at the start, you see? If you don't see, you'll need to disable your operating system's invisible file functions, or use a text editor that allows you to open hidden files, something like bbedit on the Mac platform. On windows, showing invisibles in explorer should allow any text editor to open them, and most decent editors to save them too**. Linux dudes know how to find them without any help from me. In both images, the operating system has been instructed to display invisible files. ugly, but necessary sometimes. You will also need to instruct your ftp client to do the same.
What are .htaccess files anyway?Simply put, they are invisible plain text files where one can store server directives. Server directives are anything you might put in an Apache config file (httpd.conf) or even a php.ini**, but unlike those "master" directive files, these .htaccess directives apply only to the folder in which the .htaccess file resides, and all the folders inside.This ability to plant .htaccess files in any directory of our site allows us to set up a finely-grained tree of server directives, each subfolder inheriting properties from its parent, whilst at the same time adding to, or over-riding certain directives with its own .htaccess file. For instance, you could use .htacces to enable indexes all over your site, and then deny indexing in only certain subdirectories, or deny index listings site-wide, and allow indexing in certain subdirectories. One line in the .htaccess file in your root and your whole site is altered. From here on, I'll probably refer to the main .htaccess in the root of your website as "the master .htaccess file", or "main" .htaccess file. There's a small performance penalty for all this .htaccess file checking, but not noticeable, and you'll find most of the time it's just on and there's nothing you can do about it anyway, so let's make the most of it.. ** Your main php.ini, that is, unless you are running under phpsuexec, in which case the directives would go inside individual php.ini files What can I do with .htaccess files?Almost any directive that you can put inside an httpd.conf file will also function perfectly inside an .htaccess file. Unsurprisingly, the most common use of .htaccess is to..control access.htaccess is most often used to restrict or deny access to individual files and folders. A typical example would be an "includes" folder. Your site's pages can call these included scripts all they like, but you don't want users accessing these files directly. In that case you would drop an .htaccess file in the includes folder with content something like this..
which would deny ALL direct access to ANY files in that folder. You can be more specific with your conditions, for instance limiting access to a particular IP range, here's a handy top-level rule for a local test server..
Generally these sorts of requests would bounce off your firewall anyway, but on a live server (like my dev mirror sometimes is) they become useful for filtering out undesirable IP blocks, known risks, lots of things. By the way, in case you hadn't spotted; lines beginning with "#" are ignored by Apache; handy for comments. Sometimes, you will only want to ban one IP, perhaps some persistent robot that doesn't play by the rules..
custom error documentsI guess I should briefly mention that .htaccess is where most folk configure their error documents. Usually with sommething like this..
You can also specify external URLs, though this can be problematic, and is best avoided. One quick and simple method is to specify the text in the directive itself, you can even use HTML (though there is probably a limit to how much HTML you can squeeze onto one line). Remember to begin with a ", but DO NOT end with one.
Using a custom error document is a Very Good Idea, and will give you a second chance at your almost-lost visitors. password protected directoriesThe next most obvious use for our .htaccess files is to allow access to only specific users, or user groups, in other words; password protected folders. a simple authorisation mechanism might look something like this..
You can use this same mechanism to limit only certain kinds of requests, too..
You can find loads of online examples of how to setup authorization using .htaccess, and so long as you have a real user (or create one, in this case, 'jimmy') with a real password (you will be prompted for this, twice) in a real password file (the -c switch will create it).. htpasswd -c /usr/local/var/www/html/.htpasses jimmy ..the above will work just fine. htpasswd is a tool that comes free with Apache, specifically for making and updating password files, check it out. The windows version is the same; only the file path needs to be changed; to wherever you want to put the password file. Note: if the Apache bin/ folder isn't in your PATH, you will need to cd into that directory before performing the command. Also note: You can use forward and back-slashes interchangeably with Apache/php on Windows, so this would work just fine.. htpasswd -c c:/unix/usr/local/Apache2/conf/.htpasses jimmy Relative paths are fine too; assuming you were inside the bin/ directory of our fictional Apache install, the following would do exactly the same as the above.. htpasswd -c ../conf/.htpasses jimmy Naming the password file .htpasses is a habit from when I had to keep that file inside the web site itself, and as web servers are configured to ignore files beginning with .ht, they too, remain hidden. If you keep your password file outside the web root (a better idea), then you can call it whatever you like, but the .ht_something habit is a good one to keep, even inside the web tree, it is secure enough for our basic purpose.. Once they are logged in, you can access the remote_user environmental variable, and do stuff with it..
get better protection..The authentication examples above assume that your web server supports "Basic" http authorisation, as far as I know they all do (it's in the Apache core). Trouble is, some browsers aren't sending password this way any more, personally I'm looking to php to cover my authorization needs. Basic auth works okay though, even if it isn't actually very secure - your password travels in plain text over the wire, not clever.500 errorIf you add something that the server doesn't understand or support, you will get a 500 error page, aka.. "the server did a boo-boo". Even directives that work perfectly on your test server at home may fail dramatically at your real site. In fact this is a great way to find out if .htaccess files are enabled on your site; create one, put some gibberish in it, and load a page in that folder, wait for the 500 error. if there isn't one, probably they are not enabled.If they are, we need a way to safely do live-testing without bringing the whole site to a 500 standstill. Fortunately, in much the same way as we used the <Limit> tag above, we can create conditional directives, things which will only come into effect if certain conditions are true. The most useful of these is the "ifModule" condition, which goes something like this..
..which placed in your master .htaccess file, that would set the default character encoding of your entire site to utf-8 (a good idea!), at least, anything output by PHP. If the PHP4 module isn't running on the server, the above .htaccess directive will do exactly nothing; Apache just ignores it. As well as proofing us against knocking the server into 500 mode, this also makes our .htaccess directives that wee bit more portable. Of course, if your syntax is messed-up, no amount of if-module-ing is going to prevent a error of some kind, all the more reason to practice this stuff on a local test server. groovy things to do with .htaccessSo far we've only scratched the surface. aside from authorisation, the humble .htaccess file can be put to all kinds of uses.
will almost certainly turn it back on again. And if you have mod_autoindex.c installed on your server (probably, yes), you can get nice fancy indexing, too..
..which, as well as being neater, allows users to click the titles and, for instance, order the listing by date, or file size, or whatever. It's all for free too, built-in to the server, we're just switching it on. You can control certain parameters too..
Other parameters you could add include..
I'm not mentioning the "XHTML" parameter in Apache2, because it still isn't! custom directory index filesWhile I'm here, it's worth mentioning that .htaccess is where you can specify which files you want to use as your indexes, that is, if a user requests /foo/, Apache will serve up /foo/index.html, or whatever file you specify.You can also specify multiple files, and Apache will look for each in order, and present the first one it finds. It's generally setup something like..
It really is worth scouting around the Apache documentation, often you will find controls for things you imagined were uncontrollable, thereby creating new possibilities, better options for your website. My experience of the magic "LAMP" (Linux-Apache-MySQL-PHP) has been.. "If you can imagine that it can be done, it can be done". Swap "Linux" for any decent operating system, the "AMP" part runs on most of them. Okay, so now we have nice fancy directories, and some of them password protected, if you don't watch out, you're site will get popular, and that means bandwidth.. save bandwidth with .htaccess!If you pay for your bandwidth, this wee line could save you hard cash..
All it does is enables PHP's built-in transparent zlib compression. This will half your bandwidth usage in one stroke, more than that, in fact. Of course it only works with data being output by the PHP module, but if you design your pages with this in mind, you can use php echo statements, or better yet, php "includes" for your plain html output and just compress everything! Remember, if you run phpsuexec, you'll need to put php directives in a local php.ini file, not .htaccess. hide and deny filesDo you remember I mentioned that any file beginning with .ht is invisible? .."almost every web server in the world is configured to ignore them, by default" and that is, of course, because .ht_anything files generally have server directives and passwords and stuff in them, most servers will have something like this in their main configuration..
which instructs the server to deny access to any file beginning with .ht, effectively protecting our .htaccess and other files. The "." at the start prevents them being displayed in an index, and the .ht prevents them being accessed. This version..
tells the server to deny access to *.log files. You can insert multiple file types into each rule, separating them with a pipe "|", and you can insert multiple blocks into your .htaccess file, too. I find it convenient to put all the files starting with a dot into one, and the files with denied extensions into another, something like this..
would cover all ._* resource fork files, .DS_Store files (which the Mac Finder creates all over the place) *.log files, *.comment files and of course, our .ht* files. You can add whatever file types you need to protect from direct access. I think it's clear now why the file is called ".htaccess". <FilesMatch>These days, using <FilesMatch> is preferred over <Files>, mainly because you can use regular expression in the conditions (very handy), produce clean, more readable code. Here's an example. which I use for my php-generated style sheets..
Any files matching the regular expression statement, that is files with a *.css or *.style extension, will now be handled by php, rather than simply served up by Apache. Any <Files> statements you come across can be advantageously replaced by <FilesMatch> statements. Good to know. more stuffAt the end of my .htaccess files, there always seems to be a section of "stuff"; miscellaneous commands, mainly php flags and switches; so it seems logical to finish up the page with a wee selection of those..
Note: For most of the flags I've tested, you can use on/off and true/false interchangeably, as well as 0/1, also php_value and php_flag can be switched around while things continue to work as expected! I guess, logically, booleans should always be php_flag, and values, php_value; but suffice to say, if some php erm, directive isn't working, these would all be good things to fiddle with! Of course, the php manual explains all. The bottom line is; both will work fine, but if you use the wrong type in .htaccess, say, set a php_flag using php_value, a php ini_get() command, for instance, would return true, even though you had set the value to off, because it reads off value as a string, which of course evaluates to not-zero, i.e. 1, or "true". If you don't rely on get_ini(), or similar, it's not a problem, though clearly it's better to get it right from the start. By the way; one of the values above is incorrectly set. Did you spot it? Most php settings, you can override inside your actual scripts, but I do find it handy to be able to set defaults for a folder, or an entire site, using .htaccess. |