为什么草率的加薪回应内容不是文字?

时间:2019-06-21 11:13:37

标签: python-2.7 scrapy

我正在尝试抓取电子报纸中存在的分类信息。当我尝试运行我的代码时,出现错误:

  

不支持:响应内容不是文本。

这是我的代码:

import scrapy
from imagecrawl.items import ImagecrawlItem


class ImgspiderSpider(scrapy.Spider):
    name = "imgspider"
    start_urls = ['http://www.deccanheraldepaper.com/data/pp3-20190621_10/webepaper/photos/541862.png']

    def parse(self, response):
        link = response.css('div.flex_grid img::attr(srcset)').extract()
        urls = []
        for pairs in link:
            for each in pairs.split(','):
                urls.append(each[:-3].strip())

        for img_url in urls:
            yield ImagecrawlItem(image_urls=[img_url])

items.py

import scrapy


class ImagecrawlItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    image_urls = scrapy.Field()
    images = scrapy.Field()
    pass

1 个答案:

答案 0 :(得分:0)

您的start_urls是图片的网址:

start_urls = ['http://www.deccanheraldepaper.com/data/pp3-20190621_10/webepaper/photos/541862.png']

将其删除,然后将链接添加到带有图像链接的页面。