Question

我已经调试了很长一段时间了，我不知道为什么，我无法让追加方法按我的意愿工作。现在我想要去网站的每个玩家条目（espn）我从中提取数据，并将其存储在我的player1数组中。当我打印（播放）时，它向我显示了15个不同的玩家条目但是当我将它们附加到players1数组然后在循环结束时返回它时，它只向我显示最后（或第一个）玩家15次。

def parseRoster(self, response):
    play = response.meta['play']
    players1 = []
    int = 0
    for players in response.xpath("//td[@class='sortcell']"):
        play['name'] = players.xpath("a/text()").extract()[0]
        play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
        play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
        play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
        play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
        play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
        play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
        print(play)
        players1.append(play)
    print(players1)
    return players1

如果你们想要查看我的其余代码，请告诉我并将其上传，我必须制作一个请求对象＆amp;在我的主代码中声明请求对象后立即填充元方法。

编辑：另外一个原因我不仅仅是将所有数据提取到1个列表（基本上是提取结束时[0]的原因）是因为有很多空条目我在表格中提到，我觉得这种方式更容易发送到我的数据库。

Edit1：好的，所以我将print（players1）放在for循环中，并且看到循环以某种方式用最新的玩家名称覆盖空数组。现在我不太清楚为什么会出现这种情况，因为我之前以同样的方式使用它并且它做了我想要的。

Answer 1

我假设play = response.meta['play']引用了您在之前的回调中创建的Item实例。

在for players in ...循环中，您将重写相同的实例，并将相同的实例追加15次。您正在构建一个相同Python对象的15倍的列表。

对于每个循环迭代，您需要从play复制此response.meta实例，然后设置不同的字段。这样的事情应该有效：

def parseRoster(self, response):
    play_original = response.meta['play']
    players1 = []
    int = 0
    for players in response.xpath("//td[@class='sortcell']"):

        play = play_original.copy()

        play['name'] = players.xpath("a/text()").extract()[0]
        play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
        play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
        play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
        play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
        play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
        play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
        print(play)
        players1.append(play)
    print(players1)
    return players1

我的Scrapy回调函数问题

1 个答案: