Scrapy返回一个空的json文件

时间:2017-09-08 00:46:05

标签: python json scrapy scrapy-spider scrapy-shell

我正在尝试从网站获取数据,一切似乎都是正确的,并且在shell上测试了xpath。

# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider


class KabumspiderSpider(CrawlSpider):
    name = "kabumspider"
    allowed_domain = ["www.kabum.com.br"]
    start_urls = ["https://www.kabum.com.br"]


def parse(self, response):
        categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
        links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()

        for categoria in zip(categorias, links):

            info = {
                'categoria': categoria[0],
                'link': categoria[1],
            }
            yield info

虽然输出似乎是:

[

我的代码出了什么问题?

1 个答案:

答案 0 :(得分:0)

我跑了刮刀,它对我运行良好。我发现的唯一问题是你的解析方法在课外。

# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider


class KabumspiderSpider(CrawlSpider):
    name = "kabumspider"
    allowed_domain = ["www.kabum.com.br"]
    start_urls = ["https://www.kabum.com.br"]

    def parse(self, response):
        categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
        links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()

        for categoria in zip(categorias, links):
            info = {
                'categoria': categoria[0],
                'link': categoria[1],
            }
            yield info