向量化熊猫数据框操作

时间:2019-09-03 16:43:58

标签: python pandas vectorization

每场比赛后,我都有一个主要的球员数据框和他们的统计信息(50万大),以及一个团队的数据框(30K)和他们在不同日期的18名球员的名单。我想根据他们的统计数据,从他们的名单中挑选出最好的角色球员,例如每个日期的每支球队的攻击者,后卫,边锋,然后迅速做到。目前,我的熊猫套用解决方案需要1000秒的54 cpu秒。我有50,000行,因此大约需要45分钟。我想快得多,比如说在10分钟左右。

我曾经申请过。这是玩具示例:

# this is the master data frame of players and their stats
player_data={'player': ['Tom', 'Dick', 'Harry', 'Sally'], 'goals':[1,2,1,2], 'tackles':[4,3,5,3]}
df_players=pd.DataFrame.from_dict(player_data)
# this are the teams and their rosters
team_data={'name':['Cougars','Tigers'], 'roster':[{'Tom', 'Dick'}, {'Harry', 'Sally'}]}
df_teams=pd.DataFrame.from_dict(team_data)

# this is the function called by apply
def get_role_players(x, df_players):
    attacker=df_players[df_players.player.isin(x["roster"])].sort_values("goals", ascending=False).iloc[0].player
    defender=df_players[df_players.player.isin(x.roster)].sort_values("tackles", ascending=False).iloc[0].player
    return pd.Series([x["name"],attacker, defender], index=['team','attacker', 'defender'])

# here I apply the apply function
role_players=df_teams.apply(lambda x: get_role_players(x, 
df_players), axis=1)
print(role_players)

结果是:

team    attacker    defender
Cougars Dick    Tom
Tigers  Sally   Harry

0 个答案:

没有答案