Question

我想计算广告点击小部件。

我用过robot.txt文件：

User-agent: *
Allow: /
Disallow: */ads_count/*

我还为该小部件中的所有链接添加了nofollow。

但是很多机器人仍然关注该小部件中的网址。我有客户端ip来计算网址，我有很多IP表格机器人。

Answer 1

您是否尝试在* / ads_count之前删除（*）？正如SEO的谷歌文档所说，如果你想阻止所有机器人，就像你做的那样：

User-agent: * // (to whom? (*) means all bots!
Disallow: /ads_count

请注意，指令区分大小写。例如，Disallow：/junk_file.asp会阻止http://www.example.com/junk_file.asp，但会允许http://www.example.com/Junk_file.asp。 Googlebot会忽略robots.txt中的空格（特别是空行）和未知指令。

Answer 2

Allow和*中的通配符Disallow不是原始robots.txt规范的一部分，因此并非所有robots.txt解析器都知道/记录这些规则。

如果您想阻止所有以/ads_count/开头的网页，您只需要：

User-agent: *
Disallow: /ads_count/

然而：并非所有机器人都尊重robots.txt。所以你仍然会被忽视robots.txt的坏机器人命中。

如何阻止所有搜索引擎，机器人抓取一些网址

2 个答案: