The blog or website to create a robots.txt file would reasonably have tremendous effects to optimize for search engines. You can find lots of advice and guidance on the creation of this file. trangtriblog also wrote a tutorial to create robots.txt file for the blog. It is very much written instructions, but they do not tell us that they have set for their robots.txt file look like. Therefore, instead of listening to them talk, let's see how they work.
Let me set the robots.txt file from many popular blogs and websites in various fields for your reference.
Some remarks trangtriblog
Only 2 out of 30 websites that I checked are not using a robots.txt file
Even if you do not have a special request for the search bots, you still should use a robots.txt file.
Most of them use the attribute "User-agent: *" to control and allows the search engine.
They use "Disallow" most is the RSS Feed.
There are some sites still use the URL of the sitemap in the robots.txt file.
Those who use a robots.txt file is very limited
Problogger.net
User-agent: *
Disallow:
Marketing Pilgrim
User-agent: *
Disallow:
Search Engine Journal
User-agent: *
Disallow:
Matt Cutts
User-agent: *
Allow:
User-agent: *
Disallow: / files /
Pronet Advertising
User-agent: *
Disallow: / mt
Disallow: / *. cgi $
TechCrunch
User-agent: *
Disallow: / * / feed /
Disallow: / * / trackback /
People who use robots.txt file with a lot of regulations
Online Marketing Blog
User-agent: Googlebot
Disallow: * / feed /
User-agent: *
Disallow: / Blogger /
Disallow: / wp-admin /
Disallow: / stats /
Disallow: / cgi-bin /
Disallow: / 2005x /
Shoemoney
User-Agent: Googlebot
Disallow: / link.php
Disallow: / gallery2
Disallow: / gallery2 /
Disallow: / category /
Disallow: / page /
Disallow: / pages /
Disallow: / feed /
Disallow: / feed
Scoreboard Media
User-agent: *
Disallow: / cgi-bin /
User-agent: Googlebot
Disallow: / category /
Disallow: / page /
Disallow: * / feed /
Disallow: / 2007 /
Disallow: / 2006 /
Disallow: / wp-*
SEOMoz.org
User-agent: *
Disallow: / blogdetail.php? ID = 537
Disallow: / blog? Page
Disallow: / blog / author /
Disallow: / blog / category /
Disallow: / tracker
Disallow: / UGC? Page
Disallow: / UGC / author /
Disallow: / UGC / category /
Wolf-Howl
User-agent: *
Disallow: / cgi-bin /
Disallow: / images /
Disallow: / noindex /
Disallow: / privacy-policy /
Disallow: / about /
Disallow: / company-biographies /
Disallow: / press-media-room /
Disallow: / newsletter /
Disallow: / contact-us /
Disallow: / terms-of-service /
Disallow: / terms-of-service /
Disallow: / information / comment-policy /
Disallow: / faq /
Disallow: / contact-form /
Disallow: / advertising /
Disallow: / information / licensing-information /
Disallow: / 2005 /
Disallow: / 2006 /
Disallow: / 2007 /
Disallow: / 2008 /
Disallow: / 2009 /
Disallow: / 2004 /
Disallow: / *? *
Disallow: / page /
Disallow: / iframes /
John Chow
sitemap: http://www.johnchow.com/sitemap.xml
User-agent: *
Disallow: / cgi-bin /
Disallow: / go /
Disallow: / wp-admin /
Disallow: / wp-includes /
Disallow: / author /
Disallow: / page /
Disallow: / category /
Disallow: / wp-images /
Disallow: / images /
Disallow: / backup /
Disallow: / banners /
Disallow: / archives /
Disallow: / trackback /
Disallow: / feed /
User-agent: Googlebot-Image
Allow: / wp-content/uploads /
User-agent: Mediapartners Google
Allow: /
User-agent: duggmirror
Disallow: /
Smashing Magazine
Sitemap: http://www.smashingmagazine.com/sitemap.xml
User-agent: Mediapartners-Google *
Disallow:
User-agent: *
Disallow: / styles /
Disallow: / inc /
Disallow: / tag /
Disallow: / cc /
Disallow: / category /
User-agent: MSIECrawler
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Fasterfox
Disallow: /
User-agent: Slurp
Crawl-delay: 200
Gizmodo
User-Agent: Googlebot
Disallow: / index.xml $
Disallow: / excerpts.xml $
Allow: / sitemap.xml $
Disallow: / * view = rss $
Disallow: / *? $ View = rss
Disallow: / * format = rss $
Disallow: / *? $ Format = rss
Sitemap: http://gizmodo.com/sitemap.xml
Lifehacker
User-Agent: Googlebot
Disallow: / index.xml $
Disallow: / excerpts.xml $
Allow: / sitemap.xml $
Disallow: / * view = rss $
Disallow: / *? $ View = rss
Disallow: / * format = rss $
Disallow: / *? $ Format = rss
Sitemap: http://lifehacker.com/sitemap.xml
Media Sites
Wall Street Journal
User-agent: *
Disallow: / article_email /
Disallow: / article_print /
Disallow: / PA2VJBNA4R /
Sitemap: http://online.wsj.com/sitemap.xml
ZDNet
User-agent: *
Disallow: / Ads /
Disallow: / redir /
# Disallow: / i / is removed per 190 723
Disallow: / av /
Disallow: / css /
Disallow: / error /
Disallow: / clear /
Disallow: / mac-ad
Disallow: / adlog /
URS per bug # 239819, expanded này là
Disallow: / 1300 -
Disallow: / 1301 -
Disallow: / 1302 -
Disallow: / 1303 -
Disallow: / 1304 -
Disallow: / 1305 -
Disallow: / 1306 -
Disallow: / 1307 -
Disallow: / 1308 -
Disallow: / 1309 -
Disallow: / 1310 -
Disallow: / 1311 -
Disallow: / 1312 -
Disallow: / 1313 -
Disallow: / 1314 -
Disallow: / 1315 -
Disallow: / 1316 -
Disallow: / 1317 -
NY Times
# Robots.txt, www.nytimes.com 6/29/2006
#
User-agent: *
Disallow: / pages / college /
Disallow: / college /
Disallow: / library /
Disallow: / learning /
Disallow: / aponline /
Disallow: / reuters /
Disallow: / CNET /
Disallow: / partners /
Disallow: / archives /
Disallow: / indexes /
Disallow: / thestreet /
Disallow: / NYTimes-partners /
Disallow: / financialtimes /
Allow: / pages /
Allow: / 2003 /
Allow: / 2004 /
Allow: / 2005 /
Allow: / top /
Allow: / ref /
Allow: / services / xml /
User-agent: Mediapartners-Google *
Disallow:
YouTube
# Robots.txt file for YouTube
User-agent: Mediapartners-Google *
Disallow:
User-agent: *
Disallow: / profile
Disallow: / results
Disallow: / browse
Disallow: / t / terms
Disallow: / t / privacy
Disallow: / login
Disallow: / watch_ajax
Disallow: / watch_queue_ajax
Left Google, why?
Google
User-agent: *
Allow: / searchhistory /
Disallow: / news? & Output = xhtml
Allow: / news? Output = xhtml
Disallow: / search
Disallow: / groups
Disallow: / images
Disallow: / catalogs
Disallow: / catalogs
Disallow: / news
Disallow: / nwshp
Disallow: /?
Disallow: / addurl / image?
Disallow: / pagead /
Disallow: / relpage /
Disallow: / relcontent
Disallow: / sorry /
Disallow: / imgres
Disallow: / keyword /
Disallow: / u /
Disallow: / Univ /
Disallow: / Cobrand
Disallow: / custom
Disallow: / advanced_group_search
Disallow: / advanced_search
Disallow: / googlesite
Disallow: / preferences
Disallow: / setprefs
Disallow: / SWR
Disallow: / url
Disallow: / default
Disallow: / m?
Disallow: / m / search?
Disallow: / wml?
Disallow: / wml / search?
Disallow: / xhtml?
Disallow: / xhtml / search?
Disallow: / xml?
Disallow: / imode?
Disallow: / imode / search?
Disallow: / jsky?
Disallow: / jsky / search?
Disallow: / pda?
Disallow: / pda / search?
Poster : beStyle
Viewed : 1402