从某些网站的爬网数据中获取空数组可能是什么问题?

时间:2019-06-12 14:31:52

标签: python json scrapy

我正在从某些网站的爬网数据中获取空数组,这可能是什么问题?

import scrapy
from scrapy.loader import ItemLoader
from jumia.items import JumiaItem


class LaptopsSpider(scrapy.Spider):

    name="laptops"
    start_urls = [
        'https://www.jumia.co.ke/laptops/'
    ]

    def parse(self, response):
        for laptops in response.xpath("//div[contains(@class, '-gallery')]"):
            loader = ItemLoader(item=JumiaItem(), selector=laptops, response=response)
            loader.add_xpath('brand', ".//span[contains(@class, 'brand')]/text()")
            loader.add_xpath('name', ".//span[@class='name']/text()")
            loader.add_xpath('price', ".//span[@class='price-box ri']/span[contains(@class, 'price')][1]/span[@dir='ltr']/text()")
            loader.add_xpath('link', ".//a[@class='link']/@href")
            yield loader.load_item()
        next_page = response.xpath("//a[@title='Next']/@href").extract_first()

        if next_page is not None:
            next_page_link = response.urljoin(next_page)

            yield scrapy.Request(url=next_page_link, callback=self.parse)

1 个答案:

答案 0 :(得分:0)

我签入了scrapy shell,看来有些街区没有需要的信息。检查这些结果:

In [2]: len(response.xpath("//div[contains(@class, '-gallery')]").extract())
Out[2]: 48

In [3]: len(response.xpath("//div[contains(@class, '-gallery')]//span[contains(@class, 'brand')]").extract())
Out[3]: 40

所以有48个块,但其中只有40个有效。 因此,我提议对您的for循环中的所需数据(例如支票名称或品牌)进行小检查,如果不存在,则只需continue