Scrapy FakeUserAgentError:获取浏览器时出错

时间:2017-03-26 01:02:22

标签: python linux web-scraping scrapy scrapy-middleware

我使用Scrapy FakeUserAgent并继续在我的Linux服务器上收到此错误。

Traceback (most recent call last):
  File "/usr/local/lib64/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "/usr/local/lib/python2.7/site-packages/scrapy_fake_useragent/middleware.py", line 27, in process_request
    request.headers.setdefault('User-Agent', self.ua.random)
  File "/usr/local/lib/python2.7/site-packages/fake_useragent/fake.py", line 98, in __getattr__
    raise FakeUserAgentError('Error occurred during getting browser')  # noqa
FakeUserAgentError: Error occurred during getting browser

当我同时运行多个蜘蛛时,我在Linux服务器上一直收到此错误。我自己的笔记本电脑很少发生此错误。我应该怎么做才能避免这种情况?我是否必须提高RAM或其他东西?服务器的规格是512MB RAM和1个vCPU。

2 个答案:

答案 0 :(得分:2)

我不确定RAM以及为什么错误只发生在具有最低规格的Linux服务器上。我使用fake-useragent后备功能解决了这个问题。遗憾的是,scrapy-fake-useragent没有提供任何功能来方便地设置它,所以我必须覆盖middlewares.py上的中间件功能,如下所示:

from fake_useragent import UserAgent
from scrapy_fake_useragent.middleware import RandomUserAgentMiddleware

class FakeUserAgentMiddleware(RandomUserAgentMiddleware):
    def __init__(self, crawler):
        super(FakeUserAgentMiddleware, self).__init__(crawler)
        # If failed to get random user agent, use the most common one
        self.ua = UserAgent(fallback='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36')
        self.per_proxy = crawler.settings.get('RANDOM_UA_PER_PROXY', False)
        self.ua_type = crawler.settings.get('RANDOM_UA_TYPE', 'random')
        self.proxy2ua = {}

然后我在settings.py上激活中间件,如下所示:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    # 'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400, # disable the original middleware
    'myproject.middlewares.FakeUserAgentMiddleware': 400,
    # omitted
}

更新

尝试将fake-useragent更新为0.1.5版。我使用的是0.1.4,升级后,问题从root用户开始,而不是使用回退。

答案 1 :(得分:1)

在这里使用fake_useragent 0.1.7,遇到同样的问题。

但是我已经为我的服务器修复了它。这是我的建议绕过错误的问题单。

https://github.com/hellysmile/fake-useragent/issues/59

希望有所帮助。