Question

我需要禁止将http://example.com/startup?page=2个搜索页面编入索引。

我希望http://example.com/startup被编入索引但不是http://example.com/startup?page=2和第3页，依此类推。

此外，启动可以是随机的，例如http://example.com/XXXXX?page

Answer 1

像Google网站管理员工具“test robots.txt”函数所确认的那样有效：

User-Agent: *
Disallow: /startup?page=

禁止此字段的值指定不是的部分URL 被访问。这可以是一条完整的道路，或部分路径;任何启动的URL 不会检索此值。

但是，如果网址的第一部分将更改，则必须使用通配符：

User-Agent: *
Disallow: /startup?page=
Disallow: *page=
Disallow: *?page=

Answer 2

您可以将其放在您不想编入索引的页面上：

<META NAME="ROBOTS" CONTENT="NONE">

这告诉机器人不要将页面编入索引。

在搜索页面上，使用它可能更有趣：

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

这指示机器人不对当前页面编制索引，但仍然按照此页面上的链接进行操作，允许他们访问搜索中找到的页面。

Answer 3

创建一个文本文件并将其命名为： robots.txt
添加用户代理并禁止部分（请参阅下面的示例）
将文件放在网站的根目录中

样品：

###############################
#My robots.txt file
#
User-agent: *
#
#list directories robots are not allowed to index 
#
Disallow: /testing/
Disallow: /staging/
Disallow: /admin/
Disallow: /assets/
Disallow: /images/
#
#
#list specific files robots are not allowed to index
#
Disallow: /startup?page=2
Disallow: /startup?page=3
Disallow: /startup?page=3
# 
#
#End of robots.txt file
#
###############################

以下是Google实际robots.txt file

您可以在blocking or removing pages using a robots.txt file

如何禁止robots.txt中的搜索页面

3 个答案: