我正在使用scrapy + scrapy_splash,我想为每个请求使用代理,所以我尝试在middlewares.py中设置适合它的东西。以下是我的middlewares.py
内容(代理设置部分):
class ProxyMiddleware(object):
def process_request(self, request, spider):
IPPOOL=eval(requests.get("http://192.168.89.190:8000/").text)
random_choose=random.choice(IPPOOL)
input("1111111111111111111111111111111111")
print(random_choose)
input("222222222222222222222222222222222")
proxy_addr="http://"+str(random_choose[0])+":"+str(random_choose[1])
request.meta['splash']['args']['proxy'] = proxy_addr
我在settings.py
中设置了一些内容:
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
'crawler.middlewares.ProxyMiddleware': 843,
}
我的spider
代码是:
class Exp10itSpider(scrapy.Spider):
name = "exp10it"
collected_urls=[]
domain=""
start_url=""
lua_script = """
function main(splash, args)
assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body})
assert(splash:wait(0.5))
return splash:html()
end
"""
def start_requests(self):
urls = [
#'https://www.bing.com'
#'https://httpbin.org/post^sss=lalala'
#'http://www.freebuf.com'
'http://www.ip138.com/'
#'http://geekpwn.freebuf.com'
]
self.domain=urlparse(urls[0]).hostname
self.start_url=urls[0]
for url in urls:
yield SplashRequest(url, self.parse_get, endpoint='execute',
magic_response=True, meta={'handle_httpstatus_all': True},
args={'lua_source': self.lua_script})
当我运行我的蜘蛛时,它会告诉我middlewares.py
下面的调试消息是错误的:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
response = yield method(request=request, spider=spider)
File "/root/mypypi/crawler/crawler/middlewares.py", line 70, in process_request
request.meta['splash']['args']['proxy'] = proxy_addr
KeyError: 'splash'
我想错误消息告诉我我应该为SplashRequest设置一些东西,但不能为request.meta ['splash'] ['args'] ['proxy']设置一些东西, 但我不知道该怎么做,你能帮助我吗?