Question

我正在scrapy中编写自定义缓存后端，我想将正文保存到elasticsearch中以进行全文搜索。我得到的默认响应类型是scrapy.http.response.Response，其中包含字节，当我尝试编码为字符串时，它会转到类似\u001f�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0003�}�r۸��o�j�\u0001f��\u0019�C�-۲��|N2;�q�Φ��\u000b\"!�6Er\bҲN��\u001a�o�e\u001fe�d�\u001b EJ�,[ά�Ιql\u0012\u0004\u001a�F��h的内容。

所以问题是;如何在store_response中获取HtmlResponse类型响应，以便我可以获得实际文本！我查看了设置，数量查找任何。

class ESCacheStorage(object):

    ....
    ....

    def store_response(self, spider, request, response):
        print("response type is {}".format(type(response)))
        # response type is <class 'scrapy.http.response.Response'>
        # But i want response type <class 'scrapy.http.response.HtmlResponse'>

Answer 1

COMPRESSION_ENABLED': False中的

settings.py修复了此问题。

如何在scrapy中的自定义CacheStorage中获取HtmlResponse类型

1 个答案: