Question

说我的网站children.com（我想要编入索引）也可以通过http://mother.com/children/访问（我不想索引）。

示例层次结构： / home / username / mother：http://mother.com | _ children：http://www.children.com

我会在mother@robots.txt文件中放置什么来阻止children.com中的内容和children.com的所有子目录被编入索引属于mother.com？

感谢您的任何建议

Answer 1

我已经解决了我自己的问题并通过phpwebby robots.txt分析器确认了...我已将以下代码放入mother.com/robots.txt文件中：

User-agent: Googlebot
Disallow: /
User-agent: Mediapartners-Google
Disallow: /
User-agent: Adsbot-Google
Disallow: /
User-agent: Jeeves
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: Yahoo-MMCrawler
Disallow: /
User-agent: msnbot
Disallow: /
User-agent: psbot
Disallow: /
User-agent: *
Disallow: /

并将以下内容添加到我的chilren.com robots.txt文件中。

User-agent: *
#block indexing of email and print pages -------
Disallow: /*~email.shtml
Disallow: /*~print.shtml
Sitemap: http://www.children.com/sitemap_index.xml

当然，我进行了三次检查以确保（使用robots.txt文件分析器）各个子目录无法通过mother.com域访问，并且它们可以通过children.com域进行索引。

注意：只需使用mother.com和children.com域名作为示例。

Answer 2

你实际上甚至可能不想使用robots.txt。但请使用robots meta tag和canonical tags的组合。

在所有mother.com/children页面上，添加值为“noindex”的元机器人标签。虽然搜索引擎将能够抓取页面，但它不会将这些页面添加到索引中。现在，这仍然会对内容的权威位置产生一些混淆。

因此，您需要使用跨域规范标记来通知主要搜索引擎权威内容所在的位置。因此，您将在mother.com/children的页面上添加规范标记，并在children.com上为其提供值。您需要确保如果它是特定页面，您可以将其规范化为children.com上的相同内容，因为规范标记实际上仅适用于相同的内容。

防止索引父域的子目录

2 个答案: