Emoticons wallpapers >> Make Money Blogging Tips >> Collection of robots.txt files

Collection of robots.txt files

The blog or website to create a robots.txt file would reasonably have tremendous effects to optimize for search engines. You can find lots of advice and guidance on the creation of this file. trangtriblog also wrote a tutorial to create robots.txt file for the blog. It is very much written instructions, but they do not tell us that they have set for their robots.txt file look like. Therefore, instead of listening to them talk, let's see how they work.

Let me set the robots.txt file from many popular blogs and websites in various fields for your reference.

Some remarks trangtriblog

Only 2 out of 30 websites that I checked are not using a robots.txt file
Even if you do not have a special request for the search bots, you still should use a robots.txt file.
Most of them use the attribute "User-agent: *" to control and allows the search engine.
They use "Disallow" most is the RSS Feed.
There are some sites still use the URL of the sitemap in the robots.txt file.
Those who use a robots.txt file is very limited

Problogger.net

User-agent: *
Disallow:


Marketing Pilgrim

User-agent: *
Disallow:

Search Engine Journal

User-agent: *
Disallow:

Matt Cutts

User-agent: *
Allow:
User-agent: *
Disallow: / files /

Pronet Advertising

User-agent: *
Disallow: / mt
Disallow: / *. cgi $

TechCrunch

User-agent: *
Disallow: / * / feed /
Disallow: / * / trackback /

People who use robots.txt file with a lot of regulations

Online Marketing Blog

User-agent: Googlebot
Disallow: * / feed /

User-agent: *
Disallow: / Blogger /
Disallow: / wp-admin /
Disallow: / stats /
Disallow: / cgi-bin /
Disallow: / 2005x /

Shoemoney

User-Agent: Googlebot
Disallow: / link.php
Disallow: / gallery2
Disallow: / gallery2 /
Disallow: / category /
Disallow: / page /
Disallow: / pages /
Disallow: / feed /
Disallow: / feed

Scoreboard Media

User-agent: *
Disallow: / cgi-bin /

User-agent: Googlebot
Disallow: / category /
Disallow: / page /
Disallow: * / feed /
Disallow: / 2007 /
Disallow: / 2006 /
Disallow: / wp-*

SEOMoz.org

User-agent: *
Disallow: / blogdetail.php? ID = 537
Disallow: / blog? Page
Disallow: / blog / author /
Disallow: / blog / category /
Disallow: / tracker
Disallow: / UGC? Page
Disallow: / UGC / author /
Disallow: / UGC / category /

Wolf-Howl

User-agent: *
Disallow: / cgi-bin /
Disallow: / images /
Disallow: / noindex /
Disallow: / privacy-policy /
Disallow: / about /
Disallow: / company-biographies /
Disallow: / press-media-room /
Disallow: / newsletter /
Disallow: / contact-us /
Disallow: / terms-of-service /
Disallow: / terms-of-service /
Disallow: / information / comment-policy /
Disallow: / faq /
Disallow: / contact-form /
Disallow: / advertising /
Disallow: / information / licensing-information /
Disallow: / 2005 /
Disallow: / 2006 /
Disallow: / 2007 /
Disallow: / 2008 /
Disallow: / 2009 /
Disallow: / 2004 /
Disallow: / *? *
Disallow: / page /
Disallow: / iframes /

John Chow

sitemap: http://www.johnchow.com/sitemap.xml

User-agent: *
Disallow: / cgi-bin /
Disallow: / go /
Disallow: / wp-admin /
Disallow: / wp-includes /
Disallow: / author /
Disallow: / page /
Disallow: / category /
Disallow: / wp-images /
Disallow: / images /
Disallow: / backup /
Disallow: / banners /
Disallow: / archives /
Disallow: / trackback /
Disallow: / feed /

User-agent: Googlebot-Image
Allow: / wp-content/uploads /

User-agent: Mediapartners Google
Allow: /

User-agent: duggmirror
Disallow: /

Smashing Magazine

Sitemap: http://www.smashingmagazine.com/sitemap.xml

User-agent: Mediapartners-Google *
Disallow:

User-agent: *
Disallow: / styles /
Disallow: / inc /
Disallow: / tag /
Disallow: / cc /
Disallow: / category /

User-agent: MSIECrawler
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Fasterfox
Disallow: /

User-agent: Slurp
Crawl-delay: 200

Gizmodo

User-Agent: Googlebot
Disallow: / index.xml $
Disallow: / excerpts.xml $
Allow: / sitemap.xml $
Disallow: / * view = rss $
Disallow: / *? $ View = rss
Disallow: / * format = rss $
Disallow: / *? $ Format = rss
Sitemap: http://gizmodo.com/sitemap.xml

Lifehacker

User-Agent: Googlebot
Disallow: / index.xml $
Disallow: / excerpts.xml $
Allow: / sitemap.xml $
Disallow: / * view = rss $
Disallow: / *? $ View = rss
Disallow: / * format = rss $
Disallow: / *? $ Format = rss
Sitemap: http://lifehacker.com/sitemap.xml

Media Sites

Wall Street Journal

User-agent: *
Disallow: / article_email /
Disallow: / article_print /
Disallow: / PA2VJBNA4R /
Sitemap: http://online.wsj.com/sitemap.xml

ZDNet

User-agent: *
Disallow: / Ads /
Disallow: / redir /
# Disallow: / i / is removed per 190 723
Disallow: / av /
Disallow: / css /
Disallow: / error /
Disallow: / clear /
Disallow: / mac-ad
Disallow: / adlog /
URS per bug # 239819, expanded này là
Disallow: / 1300 -
Disallow: / 1301 -
Disallow: / 1302 -
Disallow: / 1303 -
Disallow: / 1304 -
Disallow: / 1305 -
Disallow: / 1306 -
Disallow: / 1307 -
Disallow: / 1308 -
Disallow: / 1309 -
Disallow: / 1310 -
Disallow: / 1311 -
Disallow: / 1312 -
Disallow: / 1313 -
Disallow: / 1314 -
Disallow: / 1315 -
Disallow: / 1316 -
Disallow: / 1317 -

NY Times

# Robots.txt, www.nytimes.com 6/29/2006
#
User-agent: *
Disallow: / pages / college /
Disallow: / college /
Disallow: / library /
Disallow: / learning /
Disallow: / aponline /
Disallow: / reuters /
Disallow: / CNET /
Disallow: / partners /
Disallow: / archives /
Disallow: / indexes /
Disallow: / thestreet /
Disallow: / NYTimes-partners /
Disallow: / financialtimes /
Allow: / pages /
Allow: / 2003 /
Allow: / 2004 /
Allow: / 2005 /
Allow: / top /
Allow: / ref /
Allow: / services / xml /

User-agent: Mediapartners-Google *
Disallow:

YouTube

# Robots.txt file for YouTube

User-agent: Mediapartners-Google *
Disallow:

User-agent: *
Disallow: / profile
Disallow: / results
Disallow: / browse
Disallow: / t / terms
Disallow: / t / privacy
Disallow: / login
Disallow: / watch_ajax
Disallow: / watch_queue_ajax

Left Google, why?

Google

User-agent: *
Allow: / searchhistory /
Disallow: / news? & Output = xhtml
Allow: / news? Output = xhtml
Disallow: / search
Disallow: / groups
Disallow: / images
Disallow: / catalogs
Disallow: / catalogs
Disallow: / news
Disallow: / nwshp
Disallow: /?
Disallow: / addurl / image?
Disallow: / pagead /
Disallow: / relpage /
Disallow: / relcontent
Disallow: / sorry /
Disallow: / imgres
Disallow: / keyword /
Disallow: / u /
Disallow: / Univ /
Disallow: / Cobrand
Disallow: / custom
Disallow: / advanced_group_search
Disallow: / advanced_search
Disallow: / googlesite
Disallow: / preferences
Disallow: / setprefs
Disallow: / SWR
Disallow: / url
Disallow: / default
Disallow: / m?
Disallow: / m / search?
Disallow: / wml?
Disallow: / wml / search?
Disallow: / xhtml?
Disallow: / xhtml / search?
Disallow: / xml?
Disallow: / imode?
Disallow: / imode / search?
Disallow: / jsky?
Disallow: / jsky / search?
Disallow: / pda?
Disallow: / pda / search?

Poster : beStyle
Viewed : 1402

tag emoticons wallpapers
Avatars: 22928, Backgrounds: 3649
Wallpapers: 4780
Glitter Graphics : 7899
Smiley face: 2433
Share |
Online: beautiful photos