如何设置middlewares.py以使用scrapy_splash的SplashRequest代理

时间:2017-11-17 07:35:03

标签: proxy scrapy scrapy-splash

我正在使用scrapy + scrapy_splash,我想为每个请求使用代理,所以我尝试在middlewares.py中设置适合它的东西。以下是我的middlewares.py内容(代理设置部分):

class ProxyMiddleware(object):
    def process_request(self, request, spider):
        IPPOOL=eval(requests.get("http://192.168.89.190:8000/").text)
        random_choose=random.choice(IPPOOL)
        input("1111111111111111111111111111111111")
        print(random_choose)
        input("222222222222222222222222222222222")
        proxy_addr="http://"+str(random_choose[0])+":"+str(random_choose[1])
        request.meta['splash']['args']['proxy'] = proxy_addr

我在settings.py中设置了一些内容:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'crawler.middlewares.ProxyMiddleware': 843,
}

我的spider代码是:

class Exp10itSpider(scrapy.Spider):
    name = "exp10it"
    collected_urls=[]
    domain=""
    start_url=""
    lua_script = """
    function main(splash, args)
      assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body})
      assert(splash:wait(0.5))
      return splash:html()
    end
    """


    def start_requests(self):
        urls = [
            #'https://www.bing.com'
            #'https://httpbin.org/post^sss=lalala'
            #'http://www.freebuf.com'
            'http://www.ip138.com/'
            #'http://geekpwn.freebuf.com'
        ]
        self.domain=urlparse(urls[0]).hostname
        self.start_url=urls[0]
        for url in urls:
            yield SplashRequest(url, self.parse_get, endpoint='execute',
                                    magic_response=True, meta={'handle_httpstatus_all': True},
                                    args={'lua_source': self.lua_script})

当我运行我的蜘蛛时,它会告诉我middlewares.py下面的调试消息是错误的:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "/root/mypypi/crawler/crawler/middlewares.py", line 70, in process_request
    request.meta['splash']['args']['proxy'] = proxy_addr
KeyError: 'splash'

我想错误消息告诉我我应该为SplashRequest设置一些东西,但不能为request.meta ['splash'] ['args'] ['proxy']设置一些东西, 但我不知道该怎么做,你能帮助我吗?

0 个答案:

没有答案