So search engines are tricky to understand, especially if you want the search engines to find your site and people to find your site via the search engines. One of the things you may want to do is restrict pages that the search engines find, that might seem silly to exclude yourself but I don’t want people inadvertently finding my site.
Here’s an example, I don’t let* the search engines search my monthly archives. Because on the page of 20-50 unrelated posts the information on that page is pretty much unrelated by the end of the month. In December 2006 the mention of “iPods” on the 3rd, has nothing to do with any of the mentions of “family” or “Christmas” but if some searched for “iPods” & “family” or “iPods” & “Christmas” they may come upon that page and while I want traffic I’m not trying to lure people here under false pretenses. This is why I have categories like “Apple” and “Friends + Family” that will have related posts in one location (and even then it’s still kinda a wide range of articles).
So far I’ve just told you why you want a ROBOTS.TXT file but I haven’t told you how and I’m not planning on it, because Google just put together on “Controlling how search engines access and index your website with ROBOTS.TXT”. It’s got a lot of links that take you all over the place but there’s lots of good info there.If you’re just getting started don’t get confused with all the “User-Agent:” rules, just use “User-Agent: *” for now and that will apply to all search engines (that’s all I still use although I’m going to start working on getting some of my pages more mobile-ish).
For another example, I don’t index my Search Results page, this page lists what search words people have used to find my site. Because it those pages were indexed it would just kind of create a self-fulfilling loop of search results.
While on the other hand I do allow my “Talked About” page to be indexed since it doesn’t seem to give a lot of false positives to the search engines. plus once someone gets there they’ll quickly see that it’s probably not the page they’re looking for. But if they did put in some eclectic words that they see I mention often, they might hang around a bit.
* Please note that no search engine is obligated to find your rules, but it they want good results, they’ll probably want to follow your suggestions in your ROBOTS.TXT file. People wanting to steal content from your pages (text or images) will still traverse all the links on your site.