Question

我在rails应用程序中设置了rollbar。它保持报告recordnotfound，这是由于SEO搜索器（即谷歌机器人，百度，findxbot等）搜索删除的帖子。

如何防止rollbar报告SEO抓取器活动。

Answer 1

<强> TL; DR：

# ./initializers/rollbar.rb
#
# https://stackoverflow.com/questions/36588449/how-to-prevent-rollbar-from-reporting-seo-crawlers-activities
# 
# frozen_string_literal: true

crawlers = %w[Facebot Twitterbot YandexBot bingbot AhrefsBot crawler MJ12bot Yahoo GoogleBot Mail.RU_Bot SemrushBot YandexMobileBot DotBot AppleMail SeznamBot Baiduspider]
regexp = Regexp.new(Regexp.union(*crawlers).source, Regexp::IGNORECASE)

Rollbar.configure do |config|
  ignore_bots = lambda do |options|
    agent = options.fetch(:scope).fetch(:request).call.fetch(:headers)['User-Agent']
    raise Rollbar::Ignore if agent.match?(regexp)
  end

  config.before_process << ignore_bots

  ...
end

<强> ======================

如果您的Ruby版本低于2.3，请注意魔术评论frozen_string_literal并使用=~代替match?。

这里我使用的数组将转换为regexp。我之所以这样做，是因为我希望在将来防止语法和逃避开发人员的相关错误，并出于同样的原因添加忽略。

因此，在正则表达式中，您会看到Mail\.RU_Bot，而不是任何错误。

同样在您的情况下，您可以使用简单的单词bot而不是许多抓取工具，但请注意不同寻常的用户代理。就我而言，我想知道我网站上的所有抓取工具，所以我想出了这个解决方案。

另一个工作部分示例：我的生产网站上有crawler和crawler4j。我在数组中只使用crawler来阻止对它们进行通知。

我不确定agent.match?(regexp)语法是否正确，或者我必须regexp.match?(agent)，但一切正常。

我想说的最后一件事 - 我的解决方案不是非常优秀的，但它确实有效。我希望有人会分享我的代码的优化版本。这也是我建议发送数据异步的主要原因，即使用sidekiq，delayed_job或任何你想要的，不要忘记检查相关的wiki。

我的答案基于@ AndrewSouthpaw的解决方案（？），这对我不起作用。希望批准的 wiki-copy-pasted @Jesse Gibbs将以某种方式进行审核。

=======

EDIT1：如果您需要阻止滚动条在js上通知，最好检查https://github.com/ZLevine/rollbar-ignore-crawler-errors回购。

Answer 2

您似乎正在使用rollbar-gem，因此您希望使用<?xml version="1.0" encoding="utf-8"?>  <shape xmlns:android="http://schemas.android.com/apk/res/android" android:padding="10dp" android:shape="rectangle" > <solid android:color="#FFFFFF" /> <corners android:bottomLeftRadius="10dp" android:bottomRightRadius="10dp" android:topLeftRadius="10dp" android:topRightRadius="10dp" /> </shape>告诉Rollbar忽略由蜘蛛引起的错误

Rollbar::Ignore

其中handler = proc do |options| raise Rollbar::Ignore if is_crawler_error(options) end Rollbar.configure do |config| config.before_process << handler end检测导致错误的请求是否来自爬虫。

如果您使用rollbar.js来检测客户端Javascript中的错误，那么您可以使用is_crawler_error选项来过滤掉机器人导致的客户端错误：

checkIgnore

Answer 3

这就是我的所作所为：

is_crawler_error = Proc.new do |options|
  return true if options[:scope][:request]['From'] == 'bingbot(at)microsoft.com'
  return true if options[:scope][:request]['From'] == 'googlebot(at)googlebot.com'
  return true if options[:scope][:request]['User-Agent'] =~ /Facebot Twitterbot/
end

handler = proc do |options|
  raise Rollbar::Ignore if is_crawler_error.call(options)
end

config.before_process << handler

基于these docs。

如何防止rollbar报告SEO爬虫活动？

3 个答案: