Question

我正在使用Heroku管道。因此，当我推送我的应用程序时，它将被推送到暂存应用程序

https://appname.herokuapp.com/

，如果一切正确，我将该应用推广为正式产品。没有新的构建过程。这是第一次为登台而构建的应用程序。

https://appname.com/

问题在于，这会导致重复内容出现问题。站点是彼此的克隆。一模一样。我想从Google索引和搜索引擎中排除登台应用程序。

我考虑过的一种方法是使用 robots.txt 文件。

为此，我应该这样写

User-agent: *
Disallow: https://appname.herokuapp.com/

使用绝对路径，因为此文件将位于暂存和生产应用程序中的服务器上，而我只想从Google索引中删除暂存应用程序，而不要触摸生产版本。

这是正确的方法吗？

Answer 1

否，Disallow字段不能使用完整的URL引用。您的robots.txt文件将阻止以下网址：

Disallow值始终表示 URL路径的开头。

要阻止https://appname.herokuapp.com/下的所有URL，您需要：

Disallow: /

因此，https://appname.herokuapp.com/和https://appname.com/必须使用不同的robots.txt文件。

如果您不介意机器人爬行https://appname.herokuapp.com/，则可以改用noindex。但这对于两个站点也将要求不同的行为。不需要其他行为的替代方法可能是使用canonical。这会向抓取工具传达哪个网址更适合索引。

<!-- on https://appname.herokuapp.com/foobar -->
<link rel="canonical" href="https://appname.com/foobar" />

<!-- on https://appname.com/foobar -->
<link rel="canonical" href="https://appname.com/foobar" />

Answer 2

否，使用您的建议将阻止所有搜索引擎/漫游器访问https://appname.herokuapp.com/。

您应该使用的是：

User-agent: Googlebot
Disallow: /

这只会阻止Googlebot访问https://appname.herokuapp.com/。请注意，漫游器可以忽略robots.txt文件，这更像是 please 。但是Google会按照您的要求。

编辑

在看到unor的建议后，无法通过URL禁止，因此我从答案中更改了它。但是，您可以按特定文件进行屏蔽，例如/appname/或您使用/阻止Googlebot访问任何内容。