Question

阻止*的最短方法是什么？只允许主搜索引擎仅索引网站的索引页面？

User-agent:  *
Disallow: /

User-agent: Googlebot
Disallow: /
Allow: index.html

User-agent: Slurp
Disallow: /
Allow: index.html

User-agent: msn
Disallow: /
Allow: index.html

这会有用吗？

Answer 1

是的，这将是最短的方式。这不一定正确。

并非所有机器人都支持Allow指令。当同时存在适用的User-agent: *部分和User-agent: Specific-bot部分时，一些机器人对如何解释robots.txt感到困惑。

为了确保它能够奏效，您需要做以下事情：

User-agent: Googlebot
Disallow: /file1
Disallow: /file2
Disallow: /file3
# etc. until you have blocked every path except index.html

User-agent: Slurp
Disallow: /file1
Disallow: /file2
Disallow: /file3
# etc. until you have blocked every path except index.html

User-agent: msn
Disallow: /file1
Disallow: /file2
Disallow: /file3
# etc. until you have blocked every path except index.html

User-agent:  *
Disallow: /

如果你不想做所有这些工作，那么最好的办法是测试你感兴趣的每个引擎，看看他们是否会接受你提出的robots.txt文件。如果他们不这样做，请尝试更长的版本。

仅允许在index.html上进行SE索引

1 个答案: