无法抓取图片网址(Scrapy)

时间:2021-02-26 21:00:48

标签: python mongodb scrapy

我正在尝试使用 Scrapy 抓取 Flipkart 产品。 除产品图像 URL 外,所有部件数据均被提取。 尝试提取图像 URL 时,它返回一个空字符串列表,如下图所示

enter image description here

项目代码

menscloths.py(蜘蛛)

import scrapy
from ..items import FlipcartItem

class MensclothsSpider(scrapy.Spider):
    name = 'menscloths'
    next_page=2
    start_urls = ['https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']

    def parse(self, response):
        items=FlipcartItem()
        products=response.css("div._1xHGtK")
        for product in products:
            name = product.css(".IRpwTa::text").extract()
            brand = product.css("._2WkVRV::text").extract()
            original_price = product.css("._3I9_wc::text").extract()[1]
            sale_price = product.css("._30jeq3::text").extract()[0][1:]
            image_url = product.css("._2r_T1I::attr('src')").extract()
            product_page_url = "https://www.flipkart.com"+product.css("._2UzuFa::attr('href')").extract()[0]
            product_category = "men topwear"

            items["name"]=name
            items["brand"]=brand
            items["original_price"]=original_price
            items["sale_price"]=sale_price
            items["image_url"]=image_url
            items["product_page_url"]=product_page_url
            items["product_category"]=product_category
            yield items

            

item.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class FlipcartItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    brand = scrapy.Field()
    original_price = scrapy.Field()
    sale_price = scrapy.Field()
    image_url = scrapy.Field()
    product_page_url = scrapy.Field()
    product_category = scrapy.Field()

setting.py

BOT_NAME = 'flipcart'

SPIDER_MODULES = ['flipcart.spiders']
NEWSPIDER_MODULE = 'flipcart.spiders'


ITEM_PIPELINES = {
   'flipcart.pipelines.FlipcartPipeline': 300,
}

提前致谢

1 个答案:

答案 0 :(得分:2)

我以前见过这种情况发生过多次。如果您在加载页面时仔细查看图像,您可以看到图像在一段时间后出现(尽管,至少对我而言,加载所需的时间约为 1第二)。但是,您的代码只是加载页面然后尝试获取图像,而不是等待图像加载。您需要某种等待功能来等待图像加载,然后获取图像。< /p>