Question

这里我想存储网站页面上给出的列表中的数据。如果我正在运行命令

response.css('title::text').extract_first()        and
response.css("article div#section-2 li::text").extract()

单独在scrapy shell中显示shell中的预期输出。下面是我的代码，它没有以json或csv格式存储数据：

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "medical"

    start_urls = ['https://medlineplus.gov/ency/article/000178.html/']


    def parse(self, response):
        yield
        {
            'topic': response.css('title::text').extract_first(),
            'symptoms': response.css("article div#section-2 li::text").extract()
        }

我尝试使用

运行此代码

scrapy crawl medical -o medical.json

Answer 1

您需要修改您的网址，它是https://medlineplus.gov/ency/article/000178.htm而不是https://medlineplus.gov/ency/article/000178.html/。

此外，更重要的是，您需要定义一个Item类，并从您的蜘蛛的parse()回调中获得/返回它：

import scrapy


class MyItem(scrapy.Item):
    topic = scrapy.Field()
    symptoms = scrapy.Field()


class QuotesSpider(scrapy.Spider):
    name = "medical"

    allowed_domains = ['medlineplus.gov']
    start_urls = ['https://medlineplus.gov/ency/article/000178.htm']

    def parse(self, response):
        item = MyItem()

        item["topic"] = response.css('title::text').extract_first()
        item["symptoms"] = response.css("article div#section-2 li::text").extract()

        yield item

无法以json或csv格式存储使用scrapy报废的数据

1 个答案: