Question

我有www.domainname.com，origin.domainname.com指向相同的代码库。有没有办法，我可以阻止basename origin.domainname.com的所有网址被编入索引。

robot.txt中是否有一些规则可以执行此操作。两个网址都指向同一个文件夹。此外，我尝试将origin.domainname.com重定向到htaccess文件中的www.domainname.com，但它似乎无法正常工作..

如果有任何类似问题的人可以提供帮助，我将不胜感激。

由于

Answer 1

您可以将robots.txt重写为其他文件（我们将此名称命名为'robots_no.txt'，其中包含：

User-Agent: *
Disallow: /

（来源：http://www.robotstxt.org/robotstxt.html）

.htaccess文件如下所示：

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$
RewriteRule ^robots.txt$ robots_no.txt

为每个（子）域使用自定义robots.txt：

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.example.org$ [OR]
RewriteCond %{HTTP_HOST} ^example.org$
# Rewrites the above (sub)domains <domain> to robots_<domain>.txt
# example.org -> robots_example.org.txt
RewriteRule ^robots.txt$ robots_${HTTP_HOST}.txt [L]
# in all other cases, use default 'robots.txt'
RewriteRule ^robots.txt$ - [L]

您可以使用www.example.com，而不是要求搜索引擎阻止<link rel="canonical">以外的网页上的所有网页。

如果http://example.com/page.html和http://example.org/~example/page.html都指向http://www.example.com/page.html，请将下一个标记放入<head>：

<link rel="canonical" href="http://www.example.com/page.html">

另见Googles article about rel="canonical"

Answer 2

仅用于.htaccess：

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]

如何阻止搜索引擎索引以origin.domainname.com开头的所有网址

2 个答案: