Question

在使用Scrapy Downloader中间件时，您找不到所需的东西。您是建立Response对象并返回该对象还是返回随response传递的process_response变量？

我尝试了后者，但在与FilesPipeline结合使用时一直得到response has no attribute selector。

class CaptchaMiddleware(object):

def process_response(self, request, response, spider):
    download_path = spider.settings['CAPTCHA_STORE']

    # 1

    captcha_images = parse_xpath(response, CAPTCHA_PATTERN, 'image')
    if captcha_images:
        for url in captcha_images:
            url = response.urljoin(url)
            print("Downloading %s" % url)
            download_file(url, os.path.join(download_path, url.split('/')[-1]))

        for image in os.listdir(download_path):
            Image.open(image)

    # 2
    return response

如果我返回#1，则FilesPipeline运行正常并下载文件，但是如果我返回#2，它将返回错误response has no attribute selector

Answer 1

来自docs：

process_response（请求，响应，蜘蛛）process_response（）应该   可以：返回一个Response对象，返回一个Request对象或引发一个   IgnoreRequest异常。

如果它返回一个Response（它可以是相同的给定响应，或者   全新），该响应将继续与   链中下一个中间件的process_response（）。

如果它返回一个Request对象，则中间件链将停止并且   返回的请求将重新安排以供将来下载。这个   与从中返回请求的行为相同   process_request（）。

如果引发IgnoreRequest异常，则   请求（Request.errback）被调用。如果没有代码处理凸起   例外，它会被忽略且不会记录（与其他例外不同）。

Answer 2

摘自https://doc.scrapy.org/en/latest/topics/request-response.html#textresponse-objects上的文档：

TextResponse对象为基本Response添加了编码功能类，仅用于二进制数据，例如图像，声音或任何媒体文件。

裸露的Response对象没有selector属性，TextResponse和子类具有：

In [1]: from scrapy.http import Response, TextResponse                                                                                                                                                                                                                          

In [2]: Response('http://example.org', body=b'<html><body><div>Something</div></body></html>').selector                                                                                                                                                                         
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-6fdd116632d2> in <module>
----> 1 Response('http://example.org', body=b'<html><body><div>Something</div></body></html>').selector

AttributeError: 'Response' object has no attribute 'selector'

In [3]: TextResponse('http://example.org', body=b'<html><body><div>Something</div></body></html>').selector                                                                                                                                                                     
Out[3]: <Selector xpath=None data='<html><body><div>Something</div></body><'>

我没有在代码中看到创建新的响应，但是从问题的开头（“您是否构建了一个Response对象并返回那个（...）”），我怀疑代码段可能不完整，并且在#2返回的响应可以是手动创建的Response。

Scrapy中间件返回响应

2 个答案: