Question

我正在开发Web应用程序，它允许用户依次创建自己的webapp。对于我的应用程序创建的每个新的webapp，我指定了一个新的子域。例如subdomain1.xyzdomain.com，subdomain2.xyzdomain.com等。

所有这些Web应用程序都存储在数据库中，并由python脚本提供服务（比方说 default_script.py ）保存在/var/www/中。到目前为止，我已使用 robots.txt 阻止了目录（/var/www/）的搜索引擎索引。这基本上阻止了我的所有脚本的索引，包括 default_script.py 以及使用 default_script.py 脚本为多个webapps提供的内容。

但现在我想要将其中一些子域编入索引。

在搜索了一段时间后，我能够通过在 robots.txt

中明确指定脚本来阻止对脚本编制索引的方法

但我仍然怀疑以下情况：

阻止我的 default_script.py 编制索引还会阻止从 default_script.py 提供的所有内容的索引编制。如果是，那么如果我让它编入索引， default_script.py 也会开始显示在搜索结果中。
如何选择允许对某些子域进行索引。

例如：索引 subdomain1.xyzdomain.com 但不是 subdomain2.xyzdomain.com

Answer 1

没有。搜索引擎不应该关心脚本生成哪些页面。只要将webapps生成的页面编入索引，你就可以了。

第二个问题：

您应该为每个子域创建一个单独的robots.txt。也就是说，从特定子域获取robots.txt时，只返回与该sudomain相关的robots.txt文件。因此，如果您希望子域编入索引，那么该机器人文件是否允许所有。如果您不希望将其编入索引，请让机器人文件全部拒绝。

Answer 2

总结讨论，

这是我的.htaccess文件在/var/www/目录中保存的样子：

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# Rule Below allows using different robots.txt for subdomain1.
RewriteCond     %{HTTP_HOST}           ^subdomain1.xyzdomain.com$ [NC]
RewriteRule     ^(.*)robots.txt        subdomain1-robots.txt [L]

# This rule is applicable on rest of subdomains and xyzdomain.com.
RewriteRule     ^robots.txt$           robots.txt [L]

# This rule allow serving content from default_script.py for files other then robots.txt
RewriteRule     .                      default_script.py

有选择地索引子域

2 个答案: