嗨,我是蟒蛇和scrapy的新手。所以这将是一个noob问题。我也试过搜索,但找不到任何可以直接回答我问题的内容。 我试图浏览下面的国家/地区网页并将其人口存储在一个数组中,然后立即打印出来。您可以看到下面的代码,每次发出请求时都会打印。我怎么能用一系列结果批量打印?感谢
class CrawlerSpider(scrapy.Spider):
name = 'wikiCrawler'
#allowed_domains = ['web']
start_urls = ['https://en.wikipedia.org/wiki/List_of_sovereign_states']
#counter = 1
global i
i = {}
global list
list = []
def __init__(self):
self.counter = 1
pass
def parse(self, response):
for resultHref in response.xpath('//table[contains(@class, "wikitable")]//a[preceding-sibling::span[@class="flagicon"]]'):
href = resultHref.xpath('./@href').extract_first()
nameC = resultHref.xpath('./text()').extract_first()
yield scrapy.Request(response.urljoin(href), callback=self.parse_item, meta={'Country': nameC})
def parse_item(self, response):
self.counter = self.counter + 1
i['country'] = response.meta['Country']
i['population'] = response.xpath('//tr[preceding-sibling::tr/th/a/text()="Population"]/td/text()').extract_first()
yield i #this is where I would like to store the data instead of printing and then later print all together
答案 0 :(得分:0)
在/data/data
函数中创建了i变量而不是类。
测试了它,虽然xpath选择器可能需要一些改进,但它仍然有效。
parse_item