Question

我对Python和Pandas还是很陌生，但有一个我不太确定如何解决的问题。我有一个熊猫DataFrame，其中包含曲棍球运动员，他们在同一年曾为多个球队效力：

Player         Season      Team      GP        G      A       TP      
Player A        2020        A        10        8      3       11
Player A        2020        B        25        10     5       15

我希望能够合并包含同一年的同一位球员的行，并按该球员参加比赛最多的球队来排列列。在上面的示例中，B组的所有数字都是第一位，因为玩家A为B组玩了最多的游戏。

例如，上面的df会变成（HTeam代表最高的团队）：

Player        Season      HTeam      HGP    HG      HA     HTP     LTeam      LGP        LG      LA       LTP
Player A      2020          B        25     10      5      15       A         10         8       3        11

我想想解决这个问题的最初方法是使用一系列groupby max，但是我不确定这是否会达到预期的结果。任何帮助将不胜感激！

Answer 1

让我们尝试一下：

#For one season determine which of two records has the most games played
#This logic can use something like pd.cut for more that two teams in a season
df['H/L'] = np.where(df['GP'] < df.groupby(['Player', 'Season'])['GP'].transform('max') ,'L','H')

#Reshape the dataframe using indexes and unstack
a = df.set_index(['Player','Season','H/L']).unstack()

#Flatten multiindex header created by reshaping
a.columns = [f'{j}{i}' for i,j in a.columns]

#sort and move indexes back into the dataframe columns
a = a.sort_index(axis=1).reset_index()
print(a)

输出：

  Player  Season  HA  HG  HGP  HTP HTeam  LA  LG  LGP  LTP LTeam
0      A    2020   1  10   25   15     B   3   8   10   11     A

Answer 2

sort然后groupby + head/tail合并结果。如果一个玩家只有1个条目，则它将同时被视为H和L，因此您可以在必要时将它们过滤掉。

df = df.sort_values('GP')

gps = ['Player', 'Season']
pd.concat([df.groupby(gps).tail(1).set_index(gps).add_prefix('H'), 
           df.groupby(gps).head(1).set_index(gps).add_prefix('L')], axis=1)

#               HTeam  HGP  HG  HA  HTP LTeam  LGP  LG  LA  LTP
#Player  Season                                                
#PlayerA 2020       B   25  10   5   15     A   10   8   3   11

Python将数据透视表中的行划分为列

2 个答案: