在Scrapy下载器中间件中使用Deferred

时间:2014-11-03 09:31:17

标签: python scrapy twisted

我将在Scrapy downloadermiddleware中使用一些阻止代码(这等待免费代理)。 我打算使用this method

但它真的不适用于下载中间件,因为方法process_request(self, request, spider)等待isinstance(response, (Response, Request))

如何做到最好?

2 个答案:

答案 0 :(得分:0)

如@gallecio here所述,任何下载中间件方法也可能会返回延迟。

https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.DownloaderMiddleware

所以您可以简单地做这样的事情!

from twisted.internet import defer, reactor
def heavy_processing(self):
    pass    

def process_request(self, request, spider):
    d = defer.Deferred()
    reactor.callLater(15, d.callback, self.heavy_processing())
    self.log("Returning item in a few seconds...")
    return d

答案 1 :(得分:0)

您可以使用扭曲方法“ deferToThread”来运行阻止代码,而不会阻止MainThread

from twisted.internet.threads import deferToThread

class DownloaderMiddleware:    
    def process_request(self, request, spider):
        return deferToThread(self.run_blocking_code_in_diffrent_thread, request, spider)

    def run_blocking_code_in_diffrent_thread(self,request, spider) -> HtmlResponse:
        print("Code will block here on a diffrent thread and wont stop MainThread")
        request.meta["proxy"] = get_proxy_blocking_call()
        return request