Username: Save?
Password:
Home Forum Links Search Login Register*
    News: Keep The TechnoWorldInc.com Community Clean: Read Guidelines Here.
Recent Updates
[September 09, 2024, 12:27:25 PM]

[September 09, 2024, 12:27:25 PM]

[September 09, 2024, 12:27:25 PM]

[September 09, 2024, 12:27:25 PM]

[August 10, 2024, 12:34:30 PM]

[August 10, 2024, 12:34:30 PM]

[August 10, 2024, 12:34:30 PM]

[August 10, 2024, 12:34:30 PM]

[July 05, 2024, 02:11:09 PM]

[July 05, 2024, 02:11:09 PM]

[July 05, 2024, 02:11:09 PM]

[June 21, 2024, 01:43:48 PM]

[June 21, 2024, 01:43:48 PM]
Subscriptions
Get Latest Tech Updates For Free!
Resources
   Travelikers
   Funistan
   PrettyGalz
   Techlap
   FreeThemes
   Videsta
   Glamistan
   BachatMela
   GlamGalz
   Techzug
   Vidsage
   Funzug
   WorldHostInc
   Funfani
   FilmyMama
   Uploaded.Tech
   MegaPixelShop
   Netens
   Funotic
   FreeJobsInc
   FilesPark
Participate in the fastest growing Technical Encyclopedia! This website is 100% Free. Please register or login using the login box above if you have already registered. You will need to be logged in to reply, make new topics and to access all the areas. Registration is free! Click Here To Register.
+ Techno World Inc - The Best Technical Encyclopedia Online! » Forum » THE TECHNO CLUB [ TECHNOWORLDINC.COM ] » Techno Articles » Website Promotion
 What If You Don't Want Your Pages To Be Crawled and Cached
Pages: [1]   Go Down
  Print  
Author Topic: What If You Don't Want Your Pages To Be Crawled and Cached  (Read 546 times)
Shawn Tracer
TWI Hero
**********


Karma: 2
Offline Offline

Posts: 16072


View Profile
What If You Don't Want Your Pages To Be Crawled and Cached
« Posted: March 01, 2008, 05:26:09 PM »


What If You Don't Want Your Pages To Be Crawled and Cached
 by: Jerry Yu

Some website owners have pages that they want to hide from general public. The pages meet the following criteria:

    * Only accessible by trusted users if they know page URLs.
    * No links on the website that point to these pages.
    * No username and password are required to gain access as long as you know page URLs.

Let's see this scenario:

One day you created a page and you didn't put a link to it on your site. Then you told your family members about the page's URL. You thought nobody else would find it. You just made a mistake. Google and Yahoo would find your page if you or any family member ever visited websites with either Google toolbar PageRank enabled or Yahoo Companion Toolbar.

PageRank function

When you use Google toolbar with PageRank enabled, the toolbar automatically sends and records the page's URL you visited in Google's database. If a page URL is not found in Google's database, Googlebot - the robot of Google, will visit this page later to index it.

Your surfing activities are tracked whether you use the toolbar to search the web or directly type a page's URL in Google search page. Google records your visits anyway.

One day when you check what pages on your site have been indexed by Google, your hidden page comes up and you are worried. Furthermore, this page is cached. Even though you remove that page from your site, it can still be found and viewed from the cached version.

How to check what pages have been indexed?

Go to Google, type in "site:www.yoursite.com" without quotes. This query will list all the pages that have been indexed but it will only display up to 999 records as this is the limit set by Google for any queries.

How to prevent your hidden pages to be indexed and cached?

One simple but not sound solution is to disable PageRank function on the toolbar. To stop Google automatically track your surfing information, you can uncheck the PageRank checkbox to disable it.

Steps to disable PageRank function:

    * Click Options button on the toolbar (you can see the word "Options" without quotes)
    * In the pop-up window's Option tab, uncheck the PageRank checkbox.

See Google Toolbar Privacy Policy at http://toolbar.google.com/privacy.html for what information Google is collecting.

Unfortunately, disable the PageRank function is not going to completely solve your problem because, in our example, your other family members could have PageRank enabled.

A sound solution

Your problem can be tackled by using meta robots html tag. The following two tags are what you need to use. Put the tag in the <head> section of your HTML documents.

<meta name="robots" content="noindex,nofollow">

Search engines will read this page but will not index it and no links on this page will be traversed through to other pages.

<meta name="robots" content="noarchive">

Search engines will not archive/cache the page content.

How to remove an indexed and cached page

If your page has already been indexed and cached, to remove from search engine databases, do this:

1. Add <meta name="robots" content="noindex,nofollow,noarchive"> to your page head tag section. Next time when Googlebot or other robots visit your page, your page will be removed from their index and cache.

2. Do what Google suggests. "If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code."

(cited from Google web site Remove Content from Google's Index at http://www.google.com/remove.html)

One last note. Is your page now 100% hidden? Not really. If you have outbound links on the hidden page and you click the links and navigate to other websites, your hidden page's URL will appear in other sites web traffic log as HTTP referer.

You can remove outbound links from your hidden pages if that's suitable.

More resources

    * All About Robots Meta HTML Tag at http://www.WebActionGuide.com/kb/robots-meta-tag.php
    * Googlebot help page at http://www.google.com/bot.html
    * Yahoo Slurp help page at http://help.yahoo.com/help/us/ysearch/slurp/index.html
    * Comments about Google PageRank at http://www.google-watch.org/outdated/pagerank.html

What If...

Now you know how to safeguard any page on your site. What if you want to keep robots out from visiting all files in a directory? The answer is in my article Robots.txt And Search Engine Robots - http://www.WebActionGuide.com/kb/robots-txt.php

About The Author

Jerry Yu is an experienced internet marketer and web developer. Visit his site http://www.WebActionGuide.com for FREE "how-to" step-by-step action guide, tips, knowledge base articles, and more.

Logged

Pages: [1]   Go Up
  Print  
 
Jump to:  

Copyright © 2006-2023 TechnoWorldInc.com. All Rights Reserved. Privacy Policy | Disclaimer
Page created in 0.093 seconds with 24 queries.