在scrapy 0.16中如何更改通过images-pipeline下载的图像的文件名?

时间:2012-12-02 14:04:37

标签: python scrapy pipeline

我想将下载图像的文件名从现在获得的哈希值更改为图像alt标记或类似内容。

from scrapy.http import Request
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request

class DocosPipeline(object):
    def process_item(self, item, spider):
        return item

class DocosImagesPipeline(ImagesPipeline):

def get_media_requests(self, item, info):
    for image_url in item['image_urls']:
        yield Request(image_url)

def item_completed(self, results, item, info):
    image_paths = [x['path'] for ok, x in results if ok]
    if not image_paths:
        raise DropItem("Item contains no images")
    item['image_paths'] = image_paths
    return item

我已经尝试重写image_key类,但我似乎无法正确使用它。这是班级:

def image_key(self, url):
    image_guid = hashlib.sha1(url).hexdigest()
    return 'full/%s.jpg' % (image_guid)

我真的被困在这里任何帮助都会非常感激。

1 个答案:

答案 0 :(得分:0)

我不确定你把image_key类放在哪里,但下面这段代码对我来说很好

class MyImagesPipeline(ImagesPipeline):

    #Name download version
    def image_key(self, url):
        image_guid = url.split('/')[-1]
        return 'full/%s' % (image_guid)

    def get_media_requests(self, item, info):