我已经调试了很长一段时间了,我不知道为什么,我无法让追加方法按我的意愿工作。现在我想要去网站的每个玩家条目(espn)我从中提取数据,并将其存储在我的player1数组中。当我打印(播放)时,它向我显示了15个不同的玩家条目但是当我将它们附加到players1数组然后在循环结束时返回它时,它只向我显示最后(或第一个)玩家15次。
def parseRoster(self, response):
play = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1
如果你们想要查看我的其余代码,请告诉我并将其上传,我必须制作一个请求对象&在我的主代码中声明请求对象后立即填充元方法。
编辑:另外一个原因我不仅仅是将所有数据提取到1个列表(基本上是提取结束时[0]的原因)是因为有很多空条目我在表格中提到,我觉得这种方式更容易发送到我的数据库。
Edit1:好的,所以我将print(players1)放在for循环中,并且看到循环以某种方式用最新的玩家名称覆盖空数组。现在我不太清楚为什么会出现这种情况,因为我之前以同样的方式使用它并且它做了我想要的。
答案 0 :(得分:1)
我假设play = response.meta['play']
引用了您在之前的回调中创建的Item
实例。
在for players in ...
循环中,您将重写相同的实例,并将相同的实例追加15次。您正在构建一个相同Python对象的15倍的列表。
对于每个循环迭代,您需要从play
复制此response.meta
实例,然后设置不同的字段。这样的事情应该有效:
def parseRoster(self, response):
play_original = response.meta['play']
players1 = []
int = 0
for players in response.xpath("//td[@class='sortcell']"):
play = play_original.copy()
play['name'] = players.xpath("a/text()").extract()[0]
play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
print(play)
players1.append(play)
print(players1)
return players1