Question

我的存储库username.github.io

中有一个github页面

但我不希望谷歌抓取我的网站，绝对不希望它出现在搜索结果上。

只需在github页面中使用robots.txt工作吗？我知道有关于停止索引Github存储库的教程但是实际的Github页面呢？

Answer 1

我不知道它是否仍然相关，但google says您可以使用meta标记停止蜘蛛：

<meta name="robots" content="noindex">

我不确定这是否适用于所有蜘蛛或谷歌。

Answer 2

简答：

您可以使用robots.txt停止为用户GitHub页面编制索引，方法是将其添加到User Page中。这个robots.txt将成为所有项目页面的活动robots.txt，因为项目页面可以作为子域中的子目录（username.github.io/ project ）访问（用户名 .github.io）。

更长的回答：

您可以获得GitHub页面的自己的子域名（username.github.io）。根据关于MOZ的this问题和googles reference，每个子域都有/需要自己的robots.txt。

这意味着用户projectname的项目username的有效/有效robots.txt位于username.github.io/robots.txt。您可以通过为用户创建GitHub页面来放置robots.txt文件。

这是通过创建名为username.github.io的新项目/存储库来完成的，其中username是您的用户名。您现在可以在此项目/存储库的主分支中创建robots.txt文件，它应该在username.github.io/robots.txt处可见。有关项目，用户和组织页面的更多信息，请访问here。

我已经通过Google对此进行了测试，通过在我的项目/存储库myusername.github.io中放置一个html文件来确认https://github.com/myusername/myusername.github.io/tree/master的所有权，在那里创建一个robot.txt文件，然后验证我的robots.txt是否有效使用Google搜索控制台webmaster tools (googlebot-fetch)。 Google确实将其列为已屏蔽状态，Google Search Console webmaster tools (robots-testing-tool)确认了该消息。

阻止一个项目的机器人GitHub页面：

User-agent: * Disallow: /projectname/

为您的用户阻止所有GitHub页面的机器人（用户页面和所有项目页面）：

User-agent: * Disallow: /

其他选项

查看HTML meta代码

查看custom domain（redirects）了解GitHub页面

Answer 3

只需在github页面中使用robots.txt工作吗？

如果您使用的是默认的GitHub Pages子域，那么不会，因为Google只会检查https://github.io/robots.txt。

您可以确保you don't have a master branch, or that your GitHub repo is a private one，commented olavimmanuel，answer并详细说明olavimmanuel的custom domain，这不会改变任何内容。

但是，如果您在GitHub页面网站上使用Bootstrap，则可以在repo的根目录下放置robots.txt文件，它将按预期工作。使用此模式的一个示例是{{3}}的回购。

Answer 4

Google不建议您使用robots.txt文件不对网站建立索引（在本例中为GitHub页面）。实际上，即使在大多数情况下，即使您阻止了Google bot，它也会被编入索引。

相反，您应该在页面标题中添加以下内容，即使您不使用自定义域，也应易于控制。

<meta name='robots' content='noindex,nofollow' />

这将告诉Google不要为其编制索引。如果您仅阻止google bot访问您的网站，那么它仍然会像90％的时间一样编入索引，只是不会显示元描述。

停止Github页面的索引

4 个答案:

简答：

更长的回答：

其他选项