Python下载图像文件夹

时间:2018-02-28 13:34:33

标签: python-2.7 scrapy

我有Python和Scrapy的问题,我认为脚本仍在工作并将所有数据放在MongoDB上,但是当他刮他仍然只在数据库中拍照但我想在这个结构中下载/ Project /照片/链路页/ name.jpg

你的代码在这里! 这是Itmes.py

 import scrapy
from PIL import Image
class RedditItem(scrapy.Item):
    '''
    Defining the storage containers for the data we
    plan to scrape
    '''

    title = scrapy.Field()
    photoLink = scrapy.Field()

这是来自setting.py

ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/ProjectX/reddit/reddit/photos/'

这里我有scrapper.py

    from scrapy.http import Request
    from scrapy.selector import HtmlXPathSelector
    from scrapy.contrib.spiders import CrawlSpider
    from scrapy.http import HtmlResponse
    from scrapy.selector import Selector
    from datetime import datetime as dt
    import scrapy
    from reddit.items import RedditItem
    from PIL import Image
def parse_following_urls(self, response):
        item = RedditItem()
        item['title'] = response.css('h1.kiwii-font-xlarge::text').extract_first()
        item['photoLink'] = response.css("div.kiwii-carousel-picture span::attr(src)").extract()

1 个答案:

答案 0 :(得分:0)

如果要存储图像,例如:{IMAGES_STORE}/link-page/name.jpg,则需要扩展默认的ImagesPipeline类并覆盖方法file_path

例如:

from scrapy.pipelines.images import ImagesPipeline

class MyImagesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        # Code to generate {link-page/name.jpg} value

然后将其作为管道添加到您的设置文件中,而不是默认的ImagePipeline:

ITEM_PIPELINES = {'your_project.pipelines.ImagesPipeline': 1}