无法以json或csv格式存储使用scrapy报废的数据

时间:2017-03-06 14:34:54

标签: json csv web-scraping scrapy

这里我想存储网站页面上给出的列表中的数据。如果我正在运行命令

response.css('title::text').extract_first()        and
response.css("article div#section-2 li::text").extract()

单独在scrapy shell中显示shell中的预期输出。 下面是我的代码,它没有以json或csv格式存储数据:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "medical"

    start_urls = ['https://medlineplus.gov/ency/article/000178.html/']


    def parse(self, response):
        yield
        {
            'topic': response.css('title::text').extract_first(),
            'symptoms': response.css("article div#section-2 li::text").extract()
        }

我尝试使用

运行此代码
scrapy crawl medical -o medical.json

1 个答案:

答案 0 :(得分:1)

您需要修改您的网址,它是https://medlineplus.gov/ency/article/000178.htm而不是https://medlineplus.gov/ency/article/000178.html/

此外,更重要的是,您需要定义一个Item类,并从您的蜘蛛的parse()回调中获得/返回它:

import scrapy


class MyItem(scrapy.Item):
    topic = scrapy.Field()
    symptoms = scrapy.Field()


class QuotesSpider(scrapy.Spider):
    name = "medical"

    allowed_domains = ['medlineplus.gov']
    start_urls = ['https://medlineplus.gov/ency/article/000178.htm']

    def parse(self, response):
        item = MyItem()

        item["topic"] = response.css('title::text').extract_first()
        item["symptoms"] = response.css("article div#section-2 li::text").extract()

        yield item