我在沙哑中非常陌生,因此对我来说很难在沙哑中做一些非常基本的事情。我的问题是我无法重命名下载的图像。我从以下网站复制了部分代码:“ http://scrapingauthority.com/scrapy-download-images/”,但是它不起作用。所以我的蜘蛛的代码是这样的:
CSS
商品代码:
.navbar-nav .dropdown-menu {
position: absolute;
}
管道代码:
from scrapy import Request, Spider
from Imagenes.items import ImagenesItem
class AuthorSpider(Spider):
name = 'imagenpruebarenombrar'
start_urls = [
"http://quotes.toscrape.com/",
]
def parse(self, response):
item = ImagenesItem()
img_urls = [
"http://automationpractice.com/img/p/5/5-large_default.jpg",
"http://automationpractice.com/img/p/6/6-large_default.jpg",
"http://automationpractice.com/img/p/7/7-large_default.jpg",
]
img_name = [ #These are the names that I want to my images
"1",
"2",
"3",
]
item["image_urls"] = img_urls
item["image_name"] = img_name
return item
我的设置:
import scrapy
class ImagenesItem(scrapy.Item):
images = scrapy.Field()
image_urls = scrapy.Field()
image_name = scrapy.Field()
答案 0 :(得分:0)
您必须将CustomImageNamePipeline
而不是ImagesPipeline
添加到设置
如果文件pipelines.py
中有课程,则添加到settings.py
ITEM_PIPELINES = {'pipelines.CustomImageNamePipeline': 1}
或者可能是项目名称
ITEM_PIPELINES = {'Imagenes.pipelines.CustomImageNamePipeline': 1}
如果所有代码都放在一个文件中(不创建项目),则将其添加到同一文件中
ITEM_PIPELINES = {'__main__.CustomImageNamePipeline': 1}
答案 1 :(得分:0)
首先,您需要编辑settings.py
:
ITEM_PIPELINES = {'Imagenes.pipelines.CustomImageNamePipeline': 1}
您的pipelines.py
中的下一个:
class CustomImageNamePipeline(ImagesPipeline): #I copied this code from the website
def get_media_requests(self, item, info):
for image in item.get('image_urls', []):
yield scrapy.Request(image["url"], meta={'image_name': image["name"]})
def file_path(self, request, response=None, info=None):
return '%s.jpg' % request.meta['image_name']
最后进入蜘蛛网
def parse(self, response):
item = ImagenesItem()
img_urls = [
"http://automationpractice.com/img/p/5/5-large_default.jpg",
"http://automationpractice.com/img/p/6/6-large_default.jpg",
"http://automationpractice.com/img/p/7/7-large_default.jpg",
]
img_names = [ #These are the names that I want to my images
"1",
"2",
"3",
]
images = []
for image_url, image_name in zip(img_urls, img_names):
images.append({'url': image_url, 'name': image_name})
item["image_urls"] = images
yield item