我的Scrapy回调函数问题

时间:2015-01-02 14:01:10

标签: python-2.7 append scrapy

我已经调试了很长一段时间了,我不知道为什么,我无法让追加方法按我的意愿工作。现在我想要去网站的每个玩家条目(espn)我从中提取数据,并将其存储在我的player1数组中。当我打印(播放)时,它向我显示了15个不同的玩家条目但是当我将它们附加到players1数组然后在循环结束时返回它时,它只向我显示最后(或第一个)玩家15次。

def parseRoster(self, response):
    play = response.meta['play']
    players1 = []
    int = 0
    for players in response.xpath("//td[@class='sortcell']"):
        play['name'] = players.xpath("a/text()").extract()[0]
        play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
        play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
        play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
        play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
        play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
        play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
        print(play)
        players1.append(play)
    print(players1)
    return players1

如果你们想要查看我的其余代码,请告诉我并将其上传,我必须制作一个请求对象&在我的主代码中声明请求对象后立即填充元方法。

编辑:另外一个原因我不仅仅是将所有数据提取到1个列表(基本上是提取结束时[0]的原因)是因为有很多空条目我在表格中提到,我觉得这种方式更容易发送到我的数据库。

Edit1:好的,所以我将print(players1)放在for循环中,并且看到循环以某种方式用最新的玩家名称覆盖空数组。现在我不太清楚为什么会出现这种情况,因为我之前以同样的方式使用它并且它做了我想要的。

1 个答案:

答案 0 :(得分:1)

我假设play = response.meta['play']引用了您在之前的回调中创建的Item实例。

for players in ...循环中,您将重写相同的实例,并将相同的实例追加15次。您正在构建一个相同Python对象的15倍的列表。

对于每个循环迭代,您需要从play复制此response.meta实例,然后设置不同的字段。这样的事情应该有效:

def parseRoster(self, response):
    play_original = response.meta['play']
    players1 = []
    int = 0
    for players in response.xpath("//td[@class='sortcell']"):

        play = play_original.copy()

        play['name'] = players.xpath("a/text()").extract()[0]
        play['position'] = players.xpath("following-sibling::td[1]").extract()[0]
        play['age'] = players.xpath("following-sibling::td[2]").extract()[0]
        play['height'] = players.xpath("following-sibling::td[3]").extract()[0]
        play['weight'] = players.xpath("following-sibling::td[4]").extract()[0]
        play['college'] = players.xpath("following-sibling::td[5]").extract()[0]
        play['salary'] = players.xpath("following-sibling::td[6]").extract()[0]
        print(play)
        players1.append(play)
    print(players1)
    return players1