熊猫数据框合并列,名称相同,数据用逗号分隔

时间:2020-02-24 17:37:12

标签: python pandas dataframe

数据框看起来像

让我们说这个df1

teamname  player.1  player.2  player.3
xyz        abc        nan       def
gh1        nan        hgf       jnr
oed        jeo        nan       nan

输出应该像

让我们说这个df2

teamname player
xyz       abc
          def
gh1       hgf
          jnr
oed       jeo

2 个答案:

答案 0 :(得分:0)

player_cols = [col for col in df1.columns if 'player' in col.lower()] #Your player column names

df_parts = [] # List to store mini-dfs
for col in player_cols:
    df_auxiliary = df1[['teamname', col]]
    df_auxiliary = df_auxiliary.rename(columns={col:'Players'})
    df_auxiliary = df_auxiliary.dropna()
    df_parts.append(df_axuliary)

df2 = pd.concat(df_parts) # Create final df

或在“一行”中:

df2 = pd.wide_to_long(df1, stubnames='player', i=['teamname'], j='player_num')
df2 = df2.dropna()

答案 1 :(得分:0)

我会选择melt(),这很通用:

  teamname player.1 player.2 player.3
0      xyz      abc      NaN      def
1      gh1      NaN      hgf      jnr
2      oed      jeo      NaN      NaN

导致

df.melt(id_vars=['teamname'], value_name='player').dropna().drop('variable', axis=1).sort_values(['teamname'], ascending=False).set_index('teamname')


         player
teamname       
xyz         abc
xyz         def
oed         jeo
gh1         hgf
gh1         jnr

熔化后的部分将删除NaN,删除我们不需要的列并对数据框进行排序。最后,我们将teamname设置为索引。