请有人帮助我找到我缺少的地方,找到我的代码抛出的 exceptions.ImportError:没有名为middlewares的模块的解决方案。
我的文件夹结构是:
以下是settings.py中的 DOWNLOADER_MIDDLEWARES :
DOWNLOADER_MIDDLEWARES = {
'IpRotation.middleware.DmozSpider': 543,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'IpRotation.ProxyMiddleware.ProxyMiddleware': 800,
'scrapy.downloadermiddleware.useragent.UserAgentMiddleware' : None,
'IpRotation.RotateUserAgentMiddleware.RotateUserAgentMiddleware':350
}
My Spider计划:
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
我的自定义UserAgentMiddleware.py:
import logging
import random
import scrapy
from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware
class RotateUserAgentMiddleware(UserAgentMiddleware):
def __init__(self, user_agent=''):
self.user_agent = user_agent
def process_request(self, request, spider):
user_agent_list = [....]
ua = random.choice(user_agent_list)
if ua:
request.headers.setdefault('User-Agent', ua)
spider.log(
u'User-Agent: {} {}'.format(request.headers.get('User-Agent'), request)
我的自定义IPRotationMiddleWare.py:
import random
from scrapy.downloadermiddlewares.httpproxy import HttpProxyMiddleware
class ProxyMiddleware(HttpProxyMiddleware):
def __init__(self, proxy_ip=''):
self.proxy_ip = proxy_ip
def process_request(self,request,spider):
ip = random.choice(self.proxy_list)
if ip:
request.meta['proxy']= ip
proxy_list = [.......]
我无法找到名为exception的中间件的问题。 spidermiddleware和downloadermiddleware之间有什么区别。
抛出错误:
TypeError: argument of type 'NoneType' is not iterable
2015-10-12 18:29:34 [scrapy] ERROR: Error downloading <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\twisted\internet\endpoints.py", line 542, in connect
timeout=self._timeout, bindAddress=self._bindAddress)
File "C:\Python27\lib\site-packages\twisted\internet\posixbase.py", line 482, in connectTCP
c = tcp.Connector(host, port, factory, timeout, bindAddress, self)
File "C:\Python27\lib\site-packages\twisted\internet\tcp.py", line 1165, in __init__
if abstract.isIPv6Address(host):
File "C:\Python27\lib\site-packages\twisted\internet\abstract.py", line 522, in isIPv6Address
if '%' in addr:
TypeError: argument of type 'NoneType' is not iterable