在循环结束时将行添加到熊猫数据框中

时间:2020-08-23 21:39:23

标签: python pandas dataframe append

我正在尝试在数据框中添加行作为循环的一部分。

程序循环遍历URL,并以数据帧格式提取数据

for id in game_ids:
    df_team_final = []
    df_player_final = []
    url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
    content = requests.get(url)
    data = json.loads(content.content)

在循环结束时,我使用了concat合并了客队/主队(和球员)的两个df

    team_full = pd.concat([df_home_team, df_away_team])
    player_full = pd.concat([df_home_player_merge, df_away_player_merge])

然后在循环之外,我将其编程为另存为Excel

# #if cant find it, create new spread sheet
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()

当我循环浏览多个网页时,我需要随时更新df,显然,以当前格式,我只是用第二个循环覆盖了第一个URL

在循环结束时追加或添加到数据框的最佳方法是什么?

谢谢

1 个答案:

答案 0 :(得分:1)

由于我们看不到完整的代码,因此我只能在这里给出一个简单的轮廓。

我假设您没有将抓取的数据附加到某种容器中,因此在下一次迭代后它会丢失。

# empty lists outside of loop to store data
df_team_final = []
df_player_final = []

for id in game_ids:
    url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
    content = requests.get(url)
    data = json.loads(content.content)

    # create dataframes that you need
    # df_home_team, df_away_team etc
    # and append data to containers

    team_full = pd.concat([df_home_team, df_away_team])
    player_full = pd.concat([df_home_player_merge, df_away_player_merge])

    df_team_final.append(team_full)
    df_player_final.append(player_full )

现在您将数据框存储为列表,可以将其与pandas.concat

合并
# outside of the loop
team_full = pd.concat(df_team_final)
player_full = pd.concat(df_player_final)

并立即保存:

writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()

编辑

从您共享的文件中,我看到您在循环内添加了容器:

enter image description here

但是您应该将它们放在循环开始之前:

# initialize them here
df_team_final = []
df_player_final = []
for id in game_ids: