Question

这是我下载图片的方式。现在，我又创建了一个管道以插入抓取的数据。

class CmindexPipeline(ImagesPipeline):



    def get_media_requests(self, item, info):

        for image_url in item['image_url']:
            yield scrapy.Request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        print("From Images Items", item)
        return item



class MysqlPipline(object):
    def process_item(self, item, spider):
        print("From Process Items",item['image_path'])

这是我的设置。py

ITEM_PIPELINES = {'cmindex.pipelines.CmindexPipeline': 1,'cmindex.pipelines.MysqlPipline':2}
IMAGES_STORE ='E:\WorkPlace\python\cmindex\cmindex\img'
IMAGES_THUMBS = {
    '16X16': (16, 16)
}

但是很不幸，我无法在process_item中访问item ['image_paths']。它引发错误

KeyError: 'image_paths'

如果有人知道我在做什么错，请建议我。

Answer 1

process_item方法在item_completed之前被调用，因此它还没有image_paths。

如果要访问image_paths，则必须在item_completed内部进行操作，或者编写另一个位于图像管道之后的管道。

如何访问已下载图像的本地路径Scrapy

1 个答案: