Google has just announced that it will no longer obey robot.txt Noindex directives that were used to control the files crawled by Google Bot.
Most of us have been using this to restrict access to admin files and many other files that were not intended to be crawled by Google and Other Search Engine Bots.
According to the official tweet from Google, the webmasters have time until 1 September 2019 to use alternative methods to deny Google bot to crawl the files and folders.
What alternative do webmasters have for Noindex?
- Noindex in robots meta tags- You can use Noindex in meta tags in the following manner
<meta name="robots" content="noindex" />
to remove any URL from indexing. This meta tag supports HTTP response and in HTML too.
- 404 and 410 HTTP status codes- 410 is the preferred way to decently ask Google bot to remove the URL if already crawled. 410 is HTTP response for a page gone and will not be available anymore. 404 is an HTTP response for page not found which I don’t recommend for this usage.
- Password protection-Password protected pages will automatically be removed from the index.
- Disallow in robots.txt-You can use Disallow in robot.txt to disallow URL you would like to hide from Google Bot.
- Search Console- This is by far the simplest method to remove indexed URL. Login to https://www.google.com/webmasters and navigate the URL removal, this is a temporary method though.