scrapy只抓1张照片

时间:2016-08-01 10:09:13

标签: scrapy web-crawler

我想抓取链接的抓取图片:“http://vnexpress.net/photo/cuoc-song-do-day/nguoi-trung-quoc-ra-be-boi-danh-mat-chuoc-tranh-nong-3445592.html”但是代码只抓取图片(在我的计算机中)并抓取所有图片(在我的朋友计算机中)。请帮帮我

import scrapy

from scrapy.contrib.spiders import Rule, CrawlSpider
from scrapy.contrib.linkextractors import LinkExtractor
from imgur.items import ImgurItem

class ImgurSpider(CrawlSpider):
 name = 'imgur'
 allowed_domains = ['vnexpress.net']
 start_urls = ['http://vnexpress.net/photo/cuoc-song-do-day/nguoi-trung-quoc-ra-be-boi-danh-mat-chuoc-tranh-nong-3445592.html']
# rules = [Rule(LinkExtractor(allow=['/*']), 'parse123')]

def parse(self, response):
    image = ImgurItem()
    # image['title'] = response.xpath(\
    #   "//img[data-notes-url=""]").extract()
    rel = response.xpath("//div[@id='article_content']//img/@src").extract()
    image['image_urls'] = [rel[0]]
    return image

1 个答案:

答案 0 :(得分:0)

rel = response.xpath("//div[@id='article_content']//img/@src").extract()
image['image_urls'] = [rel[0]]

通过指定[0]索引只能获取一个链接。 试试

image['image_urls'] = rel

您还可以将代码拆分为url解析函数,以及下载图像的回调。