这里我想存储网站页面上给出的列表中的数据。如果我正在运行命令
response.css('title::text').extract_first() and
response.css("article div#section-2 li::text").extract()
单独在scrapy shell中显示shell中的预期输出。 下面是我的代码,它没有以json或csv格式存储数据:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "medical"
start_urls = ['https://medlineplus.gov/ency/article/000178.html/']
def parse(self, response):
yield
{
'topic': response.css('title::text').extract_first(),
'symptoms': response.css("article div#section-2 li::text").extract()
}
我尝试使用
运行此代码scrapy crawl medical -o medical.json
答案 0 :(得分:1)
您需要修改您的网址,它是https://medlineplus.gov/ency/article/000178.htm
而不是https://medlineplus.gov/ency/article/000178.html/
。
此外,更重要的是,您需要定义一个Item
类,并从您的蜘蛛的parse()
回调中获得/返回它:
import scrapy
class MyItem(scrapy.Item):
topic = scrapy.Field()
symptoms = scrapy.Field()
class QuotesSpider(scrapy.Spider):
name = "medical"
allowed_domains = ['medlineplus.gov']
start_urls = ['https://medlineplus.gov/ency/article/000178.htm']
def parse(self, response):
item = MyItem()
item["topic"] = response.css('title::text').extract_first()
item["symptoms"] = response.css("article div#section-2 li::text").extract()
yield item