我有Python和Scrapy的问题,我认为脚本仍在工作并将所有数据放在MongoDB上,但是当他刮他仍然只在数据库中拍照但我想在这个结构中下载/ Project /照片/链路页/ name.jpg
你的代码在这里! 这是Itmes.py
import scrapy
from PIL import Image
class RedditItem(scrapy.Item):
'''
Defining the storage containers for the data we
plan to scrape
'''
title = scrapy.Field()
photoLink = scrapy.Field()
这是来自setting.py
ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/ProjectX/reddit/reddit/photos/'
这里我有scrapper.py
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
from datetime import datetime as dt
import scrapy
from reddit.items import RedditItem
from PIL import Image
def parse_following_urls(self, response):
item = RedditItem()
item['title'] = response.css('h1.kiwii-font-xlarge::text').extract_first()
item['photoLink'] = response.css("div.kiwii-carousel-picture span::attr(src)").extract()
答案 0 :(得分:0)
如果要存储图像,例如:{IMAGES_STORE}/link-page/name.jpg
,则需要扩展默认的ImagesPipeline类并覆盖方法file_path
。
例如:
from scrapy.pipelines.images import ImagesPipeline
class MyImagesPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
# Code to generate {link-page/name.jpg} value
然后将其作为管道添加到您的设置文件中,而不是默认的ImagePipeline:
ITEM_PIPELINES = {'your_project.pipelines.ImagesPipeline': 1}