我从一个页面提取数据,然后在该页面的URL上进行迭代,并从另一页面获取其他信息。但是输出不正确-请参见屏幕截图。来自第二个“ def”的项目的输出下降,并且其排序与第一个“ def”的项目不匹配!检查我下面的代码结构。谢谢!
***
def parse(self, response):
rows = ***
for row in rows:
item = Items()
item['number'] = ***
item['name'] = ***
***
yield item
urls = ***
for url in urls.extract():
yield Request(urlparse.urljoin(response.url, url), callback=self.parse_player)
def parse_player(self, response):
item = Items()
item['mainposition'] = ***
item['altposition'] = ***
yield item
结果在屏幕截图上:https://snag.gy/tCaDm3.jpg
答案 0 :(得分:0)
我认为您应该在首页上收集姓名等;然后不产生它,只需通过meta
传递到下一页;然后才产生整个物品。就像这里:
def parse(self, response):
rows = ***
for row in rows:
item = Items()
item['number'] = ***
item['name'] = ***
# don't yield item here!
urls = ***
for url in urls.extract():
yield Request(response.urljoin(url), self.parse_player, meta={'item': item})
def parse_player(self, response):
item = response.meta['item']
item['mainposition'] = ***
item['altposition'] = ***
yield item