我有一个带有以下各列的Pandas数据框
game_id, date, country, winner_name, winner_age, ... winner_ranking, loser_name, loser_age, ... loser_ranking
1 1/2/10 UK . Ben 21 12 Michael 22 . 13
我想将其重塑为以下格式
game_id, date, country, competitor, name, age, ranking
1 1/2/10 UK winner Ben 21 12
1 1/2/10 UK loser Michael 22 13
即对于以前缀“ winner_”或“ loser_”开头的每一列,请删除该前缀,然后将赢家和输家分成不同的行。获胜者和失败者变量的列表很长,因此如果我必须进行硬编码并没有那么大的帮助。
这是我目前的操作方式,我想知道是否有更整洁的方法,例如使用融化?
winner_df = combined_df.loc[:,[x for x in colnames if 'loser_' not in x]]
winner_df.columns = [c.replace('winner_','') for c in winner_df.columns]
winner_df['competitor'] = 'winner'
loser_df = combined_df.loc[:,[x for x in colnames if 'winner_' not in x]]
loser_df.columns = [c.replace('loser_','') for c in loser_df.columns]
loser_df['competitor'] = 'loser'
long_df = winner_df.append(loser_df,sort=False)
答案 0 :(得分:1)
首先从所有没有列的MultiIndex
创建DataFrame.set_index
,然后在Series.str.split
的列中创建MultiIndex
,最后通过DataFrame.stack
的{ {3}}和rename
列:
df = df.set_index(['game_id','date','country'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).reset_index().rename(columns={'level_3':'competitor'})
print (df)
game_id date country competitor age name ranking
0 1 1/2/10 UK loser 22 Michael 13
1 1 1/2/10 UK winner 21 Ben 12