如何使scrapy工作

时间:2017-04-20 08:37:30

标签: python-3.x scrapy

我是scrapy的新手,我按照教程但不能使它工作。每一步都与指导相同。我想知道这是什么问题?

ROBOTSTXT_OBEY = True
DOWNLOAD_DELAY = 3
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = 'httpcache'
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = 'scrapy.extensions.httpca

并且蜘蛛编写如下:

import re
import scrapy
from bs4 import BeautifulSoup
from scrapy.http import Request
from adddelay.items import AdddelayItem


class Myspider(scrapy.Spider):
name = 'adddelay'
allowed_domains = ['23us.com']
bash_url = 'http://www.23us.com//class/'
bashurl = '.html'

def start_requests(self):
    for i in range(1, 11):
        url = self.bash_url + str(i)+'_1' + self.bashurl
        yield Request(url, self.parse)
    yield Request('http://www.23us.com/quanben/1', self.parse)

def parse(self, response):
    print(response.text)

并解决错误" TypeError:' float'对象不可迭代" ;输出的一部分为

2017-04-20 16:16:58 [scrapy] INFO: Scrapy 1.1.1 started (bot: adddelay)
2017-04-20 16:16:58 [scrapy] INFO: Overridden settings: {'SPIDER_MODULES': 
['adddelay.spiders'], 'BOT_NAME': 'adddelay', 'NEWSPIDER_MODULE': 
'adddelay.spiders', 'DOWNLOAD_DELAY': 3, 'HTTPCACHE_ENABLED': True}
2017-04-20 16:16:58 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2017-04-20 16:16:59 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2017-04-20 16:16:59 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-04-20 16:16:59 [scrapy] INFO: Enabled item pipelines:
[]
2017-04-20 16:16:59 [scrapy] INFO: Spider opened
2017-04-20 16:16:59 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-04-20 16:16:59 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-04-20 16:16:59 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/1_1.html>
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-04-20 16:17:03 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/2_1.html>
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-04-20 16:17:08 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/3_1.html>
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-04-20 16:17:12 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/4_1.html>
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)

我忘记了如果没有入口点,就无法在pycharm中调试scrapy;代码应该放在你的scrapy&#39;根目录

from scrapy.cmdline import execute
execute(['scrapy', 'crawl', 'adddelay'])

我已经解决了这个问题。

1 个答案:

答案 0 :(得分:0)

我弄清楚了这个问题,控制台引发了这个问题,因为我的scrpay的版本是1.0.x而twisted'version是17.1.1.One可以通过1.3.3的安装scrapy来解决这个问题。 1.3.3的Scrapy与17.1.1的扭曲效果很好。