我正在尝试使用 Scrapy 抓取 Flipkart 产品。 除产品图像 URL 外,所有部件数据均被提取。 尝试提取图像 URL 时,它返回一个空字符串列表,如下图所示
项目代码
menscloths.py(蜘蛛)
import scrapy
from ..items import FlipcartItem
class MensclothsSpider(scrapy.Spider):
name = 'menscloths'
next_page=2
start_urls = ['https://www.flipkart.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']
def parse(self, response):
items=FlipcartItem()
products=response.css("div._1xHGtK")
for product in products:
name = product.css(".IRpwTa::text").extract()
brand = product.css("._2WkVRV::text").extract()
original_price = product.css("._3I9_wc::text").extract()[1]
sale_price = product.css("._30jeq3::text").extract()[0][1:]
image_url = product.css("._2r_T1I::attr('src')").extract()
product_page_url = "https://www.flipkart.com"+product.css("._2UzuFa::attr('href')").extract()[0]
product_category = "men topwear"
items["name"]=name
items["brand"]=brand
items["original_price"]=original_price
items["sale_price"]=sale_price
items["image_url"]=image_url
items["product_page_url"]=product_page_url
items["product_category"]=product_category
yield items
item.py
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
import scrapy
class FlipcartItem(scrapy.Item):
# define the fields for your item here like:
name = scrapy.Field()
brand = scrapy.Field()
original_price = scrapy.Field()
sale_price = scrapy.Field()
image_url = scrapy.Field()
product_page_url = scrapy.Field()
product_category = scrapy.Field()
setting.py
BOT_NAME = 'flipcart'
SPIDER_MODULES = ['flipcart.spiders']
NEWSPIDER_MODULE = 'flipcart.spiders'
ITEM_PIPELINES = {
'flipcart.pipelines.FlipcartPipeline': 300,
}
提前致谢
答案 0 :(得分:2)
我以前见过这种情况发生过多次。如果您在加载页面时仔细查看图像,您可以看到图像在一段时间后出现(尽管,至少对我而言,加载所需的时间约为 1第二)。但是,您的代码只是加载页面然后尝试获取图像,而不是等待图像加载。您需要某种等待功能来等待图像加载,然后获取图像。< /p>