我正在使用默认ImagePipeline的Scrapy编写图像剪贴簿。
一般来说,现在一切都运转良好。 但是我无法获得已删除图像的路径。
items.py
:
class MyItem(scrapy.Item):
name = scrapy.Field()
type = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pipelines.py
:
class MyPipeline(object):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def item_completed(self, results, item, info):
mage_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
myspider.py
:
import scrapy
from scrapy.contrib.spiders import Rule, CrawlSpider
from scrapy.contrib.linkextractors import LinkExtractor
from mycrawler.items import MyItem
class VscrawlerSpider(CrawlSpider):
"""docstring for VscrawlerSpider"""
name = "myspider"
allowed_domains = ["vesselfinder.com"]
start_urls = [
"https://www.vesselfinder.com/vessels?page=1"
]
rules = [
Rule(LinkExtractor(allow=r'vesselfinder.com/vessels\?page=[1-4]'),
callback='parse_item', follow=True)
]
def parse_item(self, response):
ships = response.xpath('//div[@class="items"]/article')
for ship in ships:
item = MyItem()
item['name'] = ship.xpath('div[2]/header/h1/a/text()').extract()[1].strip()
item['image_urls'] = [ship.xpath('div[1]/a/picture/img/@src').extract()[0]]
item['type'] = ship.xpath('div[2]/div[2]/div[2]/text()').extract()[0]
str = item['image_paths'][0] + item['type'] + item['name']
yield item
我收到了错误:
exceptions.KeyError:' image_paths'。
我尝试使用item['images'][0].path
,但仍然会出现一些错误。我不知道这个错误来自哪里?
答案 0 :(得分:0)
您尚未定义image_paths
字段,请定义它:
class MyItem(scrapy.Item):
# ...
image_paths = scrapy.Field()
您可能打算使用images
字段代替