Question

我有example.com/post/alai-fm-sri-lanka-listen-online-1467/

等网址

我想使用robots.txt

删除所有包含帖子字词的网址

那么格式是否正确？

Disallow: /post-*

Disallow: /?page=post

Disallow: /*page=post

Answer 1

（请注意，该文件必须被称为robots.txt;我已在您的问题中对其进行了更正。）

您只包含一个示例网址，其中“post”是第一个路径段。如果您的所有网址都是这样，则以下robots.txt应该有效：

User-agent: *
Disallow: /post/

它会阻止以下网址：

http://example.com/post/
http://example.com/post/foobar
http://example.com/post/foo/bar
...

仍然允许使用以下网址：

http://example.com/post
http://example.com/foo/post/
http://example.com/foo/bar/post
http://example.com/foo?page=post
http://example.com/foo?post=1
...

Answer 2

Googlebot和Bingbot都处理有限的通配符，因此可以使用：

Disallow: /*post

当然，这也将禁止任何包含“compost”，“outpost”，“poster”或任何字样的网址，其中包含子字符串“post”。

你可以尝试让它好一点。例如：

Disallow: /*/post    // any segment that starts with "post"
Disallow: /*?post=   // the post query parameter
Disallow: /*=post    // any value that starts with "post"

但是要明白，并非所有机器人都支持通配符，而且那些机器人都支持通配符。 Bing和Google正确处理它们。其他机器人无法保证。

禁止使用robots.txt动态网址

2 个答案: