由于某些原因,Scrapy将不再在我的机器上运行。我尝试升级scrapy,卸载它,重新安装它,没有骰子。任何人都可以对此有所了解吗?
这是跟踪:
Slevins-iMac:goodstuff slevin$ scrapy crawl chees
2017-01-28 18:20:38 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: goodstuff)
2017-01-28 18:20:38 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'goodstuff.spiders', 'SPIDER_MODULES': ['goodstuff.spiders'], 'USER_AGENT': 'GoodStuff (+http://www.goodstuff.com)', 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'goodstuff'}
2017-01-28 18:20:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2017-01-28 18:21:53 [twisted] CRITICAL: Unhandled error in Deferred:
2017-01-28 18:21:53 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/slevin/Library/Python/2.7/lib/python/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
result = g.send(result)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/middleware.py", line 40, in from_settings
mw = mwcls()
File "/Users/slevin/Documents/GoodStuff/Scrapers/goodstuff/goodstuff/middleware.py", line 7, in __init__
self.ua = UserAgent()
File "/Library/Python/2.7/site-packages/fake_useragent/fake.py", line 17, in __init__
self.load()
File "/Library/Python/2.7/site-packages/fake_useragent/fake.py", line 21, in load
self.data = load_cached()
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 138, in load_cached
update()
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 133, in update
write(load())
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 99, in load
browsers_dict[browser_key] = get_browser_versions(browser)
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 64, in get_browser_versions
html = get(settings.BROWSER_BASE_PAGE.format(browser=quote_plus(browser)))
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 29, in get
return urlopen(request, timeout=settings.HTTP_TIMEOUT).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
我还尝试在安装1.3.0之后升级Scrapy,但是当pip尝试卸载6-1.4.1时,我收到了权限被拒绝错误。
答案 0 :(得分:1)
此问题与Scrapy和Twisted无关。从日志中可以看出,您使用基于https://github.com/hellysmile/fake-useragent的自定义中间件,后者又连接到http://useragentstring.com/以检索浏览器版本列表 - 并且http://useragentstring.com/pages/useragentstring.php?name=请求导致超时错误。在撰写本文时,仍然无法访问该页面。
至于我,使用这样的库(在每个请求上连接到第三方服务器)是真正的开销。考虑使用一些自动生成虚假用户代理的库,如https://pypi.python.org/pypi/user_agent