我有一个简单的scrapy项目。我正在爬行www.anthropologie.com销售商品。我想使用标准的imagePipeline下载我正在抓的销售商品。我在settings.py文件中启用了标准imagePipeline以及IMAGE_STORE的有效目录。我有必要的字段image_url和图像。我的蜘蛛正在通过检查浏览器中的网址来验证图像的正确网址。当我运行蜘蛛时,表明管道已启用。但是我没有看到图像被下载的迹象,我在正确的目录中找不到图像。
以下是我的代码示例:
settings.py:
BOT_NAME = 'LTTcrawlers'
SPIDER_MODULES = ['LTTcrawlers.spiders']
NEWSPIDER_MODULE = 'LTTcrawlers.spiders'
ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = 'images'
items.py
from scrapy.item import Item, Field
class saleItem(Item):
image_url = Field()
images = Field()
retailer = Field()
name = Field()
prev_price = Field()
sale_price = Field()
link = Field()
url = Field()
anthro_spider.py:
from scrapy.spider import Spider
from scrapy.selector import Selector
from LTTcrawlers.items import saleItem
class AnthroSpider(Spider):
name = "anthro"
allowed_domains = ['www.anthropologie.com']
start_urls = [
'http://www.anthropologie.com/anthro/category/clothing/shopsale-clothing.jsp?&id=SHOPSALE-CLOTHING&facetSelected=true&itemCount=100&bucketPriceHigh=10.0&cm_sp=LEFTNAV-_-PRICE-_-BUCKETPRICE%3C25.0'
]
def parse(self, response):
sel = Selector(response)
items = sel.xpath('//div[@class="category-items"]/div')
sale_items = []
for item in items:
sale_item = saleItem()
sale_item["retailer"] = "Anthropologie"
sale_item["name"] = item.xpath("./div[@class='item-description']/a/text()").extract()[0].encode('ascii','ignore')
sale_item["sale_price"]= item.xpath("./div[@class='item-description']/div/span/text()").extract()[0].encode('ascii', 'ignore')
sale_item["prev_price"] = item.xpath("./div[@class='item-description']/div/span/span/text()").extract()[0].encode('ascii', 'ignore')
sale_item["url"] = item.xpath("./div[@class='item-description']/a/@href"
).extract()[0].encode('ascii', 'ignore')
sale_item["image_url"] = item.xpath('.//img/@data-original').extract()
sale_items.append(sale_item)
return sale_items
没有报告错误,所以我无法弄清楚我错过了什么。
答案 0 :(得分:1)
在items.py文件中,它应为image_urls = Field()
。不是image_url = Field()