Question

我想完全阻止bing爬行我的网站（它以惊人的速度攻击我的网站（每月500GB的数据）。

我在bing网站管理员工具中添加了1000个子域名，因此我无法设置每个人的抓取速度。我曾尝试使用robots.txt阻止它，但它无法正常工作这是我的robots.txt

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
Disallow: bingbot
User-agent: ia_archiver
Disallow: /

Answer 1

这肯定会影响您的搜索引擎优化/搜索排名，并会导致网页从索引中删除，因此请谨慎使用

如果您安装了iis重写模块，则可以根据用户代理字符串阻止请求（如果不是here）

然后在您的webconfig中添加一条规则，如下所示：

<system.webServer>
  <rules>
    <rule name="Request Blocking Rule" stopProcessing="true">
      <match url=".*" />
      <conditions>
        <add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" />
      </conditions>
      <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." />
    </rule>
  </rules>
</system.webServer>

如果机器人到达您的网站，这将返回403.

<强>更新

看看你的robots.txt，我认为它应该是：

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
Disallow: /
User-agent: ia_archiver
Disallow: /

Answer 2

您的robots.txt不正确：

您需要在记录之间换行（记录以一条或多条User-agent行开头）。
Disallow: bingbot不允许抓取路径以＆＃34; bingbot＆＃34;开头的网址（即http://example.com/bingbot），这可能不是你想要的。
不是错误，但不需要Disallow:（因为它是默认值）。

所以你可能想要使用：

User-agent: *
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member

User-agent: bingbot
User-agent: ia_archiver
Disallow: /

这不允许抓取任何东西，用于＆＃34; bingbot＆＃34;和＆＃34; ia_archiver＆＃34;。除路径以/member，/cgi-bin/或*.axd开头的网址外，其他所有机器人都可以抓取所有内容。

请注意，*.axd将按照原始robots.txt规范按字面意思解释（因此他们不会抓取http://example.com/*.axd，但会抓取http://example.com/foo.axd）。但是，许多机器人扩展了规范并将*解释为某种通配符。

阻止bingbot抓取我的网站

2 个答案: