Question

我在root域中有一个wordpress网站。现在，我在子文件夹中添加了一个论坛作为mydomain / forum 它使站点地图如下：mydomain / forum / sitemap_index.xml。将该站点地图提交给谷歌，听起来谷歌无法使用“由robots.txt阻止的网址”的消息访问子站点地图 - 值：mydomain / forum / sitemap-forums.xml？page = 1 ---价值：mydoamin /forum/sitemap-index.xml?page=1。

这是我的robots.txt：

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads


# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

Sitemap: mydomain/sitemap_index.xml
Sitemap: mydomain/forum/sitemap_index.xml

我应该向robots.txt添加什么内容？任何帮助将不胜感激。提前致谢

Answer 1

为了澄清，我假设你的例子中的'mydomain'是该计划的替身加上完全合格的域名，对吗？（例如“http://www.whatever.com”，而不是“whatever.com”或“www.whatever.com”）我认为必须如此，因为您在相同格式的Google错误消息中有这种情况。

错误消息表明Google正在从您的robots.txt文件以外的其他位置获取网址。 robots.txt文件将站点地图网址列为：

mydomain/forum/sitemap_index.xml

但错误消息显示Google正在尝试加载网址：

mydomain/forum/sitemap-index.xml?page=1

此第二个网址被屏蔽，因为您的robots.txt文件会阻止包含问号的任何网址：

Disallow: /*?*
Disallow: /*?

（顺便说一句，这两行完全相同。你可以放心地删除第一行）但Google仍然可以使用更简单的URL读取站点地图文件，因此页面可能仍会被抓取。如果你真的想摆脱错误信息，你可以随时添加：

Allow: /forum/sitemap-index.xml?page=1

这将覆盖仅针对站点地图网址的不允许。（这至少适用于谷歌 - 任何其他搜索引擎的YMMV）

Google网站管理员中的robots.txt消息阻止了网址

1 个答案: