我正在尝试抓取电子报纸中存在的分类信息。当我尝试运行我的代码时,出现错误:
不支持:响应内容不是文本。
这是我的代码:
import scrapy
from imagecrawl.items import ImagecrawlItem
class ImgspiderSpider(scrapy.Spider):
name = "imgspider"
start_urls = ['http://www.deccanheraldepaper.com/data/pp3-20190621_10/webepaper/photos/541862.png']
def parse(self, response):
link = response.css('div.flex_grid img::attr(srcset)').extract()
urls = []
for pairs in link:
for each in pairs.split(','):
urls.append(each[:-3].strip())
for img_url in urls:
yield ImagecrawlItem(image_urls=[img_url])
items.py
import scrapy
class ImagecrawlItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pass
答案 0 :(得分:0)
您的start_urls
是图片的网址:
start_urls = ['http://www.deccanheraldepaper.com/data/pp3-20190621_10/webepaper/photos/541862.png']
将其删除,然后将链接添加到带有图像链接的页面。