如何在scrapy上存储产量请求响应

时间:2017-04-14 08:25:00

标签: python scrapy

嗨,我是蟒蛇和scrapy的新手。所以这将是一个noob问题。我也试过搜索,但找不到任何可以直接回答我问题的内容。 我试图浏览下面的国家/地区网页并将其人口存储在一个数组中,然后立即打印出来。您可以看到下面的代码,每次发出请求时都会打印。我怎么能用一系列结果批量打印?感谢

class CrawlerSpider(scrapy.Spider):
    name = 'wikiCrawler'
    #allowed_domains = ['web']
    start_urls = ['https://en.wikipedia.org/wiki/List_of_sovereign_states']
    #counter = 1
    global i
    i = {}
    global list
    list = []

    def __init__(self):
        self.counter = 1
        pass

    def parse(self, response):

        for resultHref in response.xpath('//table[contains(@class, "wikitable")]//a[preceding-sibling::span[@class="flagicon"]]'):
            href = resultHref.xpath('./@href').extract_first()
            nameC = resultHref.xpath('./text()').extract_first()
            yield scrapy.Request(response.urljoin(href), callback=self.parse_item, meta={'Country': nameC})

    def parse_item(self, response):
        self.counter = self.counter + 1
        i['country'] = response.meta['Country']
        i['population'] = response.xpath('//tr[preceding-sibling::tr/th/a/text()="Population"]/td/text()').extract_first()
        yield i #this is where I would like to store the data instead of printing and then later print all together

1 个答案:

答案 0 :(得分:0)

/data/data函数中创建了i变量而不是类。 测试了它,虽然xpath选择器可能需要一些改进,但它仍然有效。

parse_item