Question

我用google链接运行简单的scrapy蜘蛛，提供你好的搜索结果但是有错误

代码（蜘蛛代码）

import scrapy
import re
class LinsSpider(scrapy.Spider):
   name = "lins"
   allowed_domains = ["www.google.com"]
   start_urls = ('https://www.google.co.in/?gfe_rd=cr&ei=78uyWPjFH8WL8Qe7kKf4BA#q=hello&*',)
   def parse(self, response):
       pagestr = "satanimant@gmail.com"

    yield
    {
            'asin' : str(re.search("^[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*$",pagestr).group(1).strip()),
    }

错误是

2017-02-26 18:06:11 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-02-26 18:06:11 [scrapy] ERROR: Error downloading <GET http://www.google.com/>
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request
    return handler(request, spider)
  File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request
    return agent.download_request(request)
  File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 211, in download_request
    d = agent.request(method, url, headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
    parsedURI.originForm)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 788, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "/usr/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-02-26 18:06:11 [scrapy] INFO: Closing spider (finished)
2017-02-26 18:06:11 [scrapy] INFO: Dumping Scrapy stats:

请帮我解决这个问题，我有ubuntu 16.10

Answer 1

我发现了问题。这是扭曲的版本太高，你可以把它改成16.6.0，它运作成功！

在python中下载任何URL时出错

1 个答案: