我是使用Scrapy的初学者。我正在尝试下载图像并设置管道,但是有些错误,我对此一无所知。
books.py
class Books2Spider(Spider):
name = 'books2'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com']
def parse(self, response):
books = response.xpath('//h3/a/@href').extract()
...
pass
def parse_book(self, response):
l = ItemLoader(item=BooksCrawlerItem(), response=response)
title = response.css('h1::text').extract_first()
price = response.xpath('//*[@class="price_color"]/text()').extract_first()
image_urls = response.xpath('//img/@src').extract_first()
image_urls = image_urls.replace('../..', 'http://books.toscrape.com/')
l.add_value('title', title)
l.add_value('price', price)
l.add_value('image_urls', image_urls)
return l.load_item()
settings.py
ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1
}
IMAGES_STORE = {
'/home/jaki/Dev/WebScrapingScratch/images'
}
我正在抓取此命令scrapy crawl books2
。如果一切正常,则将下载图像。但是我正面临错误。错误是
...如果os.path.isabs(uri):#支持win32路径,例如: C:\ some \ dir文件“ /usr/lib/python3.6/posixpath.py”,第66行,在 伊萨布斯 s = os.fspath(s)TypeError:预期的str,字节或os.PathLike对象,未设置
答案 0 :(得分:1)
IMAGE_STORE
设置必须是单个路径。
替换:
IMAGES_STORE = {
'/home/jaki/Dev/WebScrapingScratch/images'
}
具有:
IMAGES_STORE = '/home/jaki/Dev/WebScrapingScratch/images'
{'asdf'}
是带有字符串asdf
的{{3}},因此是错误消息。