Question

我有一个网站说：

http://domain.com/

镜像站点

http://cdn.domain.com/

我不希望将cdn编入索引。如何编写robots.txt规则，以避免cdn被编入索引而不会打扰我现有的robots.txt排除。

我现在的robots.txt不包括：

User-agent: *
Disallow: /abc.php

如何避免将cdn.domain.com编入索引？

User-agent: *
Disallow: /abc.php

Answer 1

在根.htaccess文件中添加以下内容

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt

然后创建一个单独的robots-cdn.txt：

User-agent: *
Disallow: /

当通过http://cdn.domain.com/robots.txt访问时，将返回robots-cdn.txt文件的内容...否则重写将不会启动，并且真正的robots.txt将启动。

通过这种方式，您可以自由地镜像整个站点（包括.htaccess）文件中的预期行为

更新：

Answer 2

如果代码库相同，您可以动态生成robots.txt并根据请求的（子）域更改其内容。