Thursday, October 29, 2009

How To: Search Engine Webpage Removal - A Search Engine Entry Removal Roundup

If you run a website of any type, there is a good chance that you'll want to remove content from Google, Bing, and other search engines at some point, either due to outdated information or sensitive data exposure. Below are links to the documentation provided by each of the major search engines for their removal process.

Most search engines will tell you that your first action should be to create an appropriate robots.txt, and many want you to return a 404 error. If you don't, they may keep your content cached for even longer than they might otherwise.

Google

First, you can build and submit a removal request for information, images, outdated or inappropriate content.

Then, you can remove your own content, then cause Google to re-index it more quickly using their webpage removal request tool.

Finaly, make sure you follow Google's noindex meta tag and robots.txt instructions.

Yahoo!

With Yahoo's move to the Bing search engine, their removal process has changed. You can use their SiteExplorer tool to remove your site from their results.

Ask (formerly Ask Jeeves)

Ask only provides robot.txt support, and has no formal published removal process.

Bing

Microsoft's new search engine has recently published removal instructions.

AltaVista

Per AltaVista's support information,

"If an AltaVista user comes across web pages that contain private personal, professional or financial information that is not available to the public and/or may have been illegally obtained, he or she can write to legal-support-uk@av.com to request that the offending URL be removed from AltaVista's index. Please note that removing said URL from AltaVista's index does not remove the URL from the public internet or the indexes of other search engines."
Archive.org / the Wayback Machine

Archive.org provides a long term snapshot of much of the Internet, dated by when the page was crawled. If your site has been available for any length of time, and if you have static content that it can crawl, there's a good chance you'll want to contact Archive.org for exclusion.

No comments: