我正在尝试在数据框中添加行作为循环的一部分。
程序循环遍历URL,并以数据帧格式提取数据
for id in game_ids:
df_team_final = []
df_player_final = []
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
在循环结束时,我使用了concat合并了客队/主队(和球员)的两个df
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
然后在循环之外,我将其编程为另存为Excel
# #if cant find it, create new spread sheet
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
当我循环浏览多个网页时,我需要随时更新df,显然,以当前格式,我只是用第二个循环覆盖了第一个URL
在循环结束时追加或添加到数据框的最佳方法是什么?
谢谢
答案 0 :(得分:1)
由于我们看不到完整的代码,因此我只能在这里给出一个简单的轮廓。
我假设您没有将抓取的数据附加到某种容器中,因此在下一次迭代后它会丢失。
# empty lists outside of loop to store data
df_team_final = []
df_player_final = []
for id in game_ids:
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
# create dataframes that you need
# df_home_team, df_away_team etc
# and append data to containers
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
df_team_final.append(team_full)
df_player_final.append(player_full )
现在您将数据框存储为列表,可以将其与pandas.concat
# outside of the loop
team_full = pd.concat(df_team_final)
player_full = pd.concat(df_player_final)
并立即保存:
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
从您共享的文件中,我看到您在循环内添加了容器:
但是您应该将它们放在循环开始之前:
# initialize them here
df_team_final = []
df_player_final = []
for id in game_ids: