我有一个这样的熊猫数据框
,我想将其转换为以下内容(不使用任何循环!):
任何想法如何做到?
如果图像不显示:
我有一个包含2列的数据框:名称和爱好。我有以下几行:
Anna drawing
Anna swimming
Anna skiing
Lisa running
Lisa singing
Tom drawing
我想将其转换为包含4列的数据框:名称,爱好1,爱好2,爱好3。并具有以下几行:
Anna drawing swimming skiing
Lisa running singing NaN
Tom drawing NaN NaN
答案 0 :(得分:0)
使用gruopby
和reset_index
为每个名称组中的每个爱好创建数字索引。重整形状(解叠)以获得矩阵格式。
df = pd.DataFrame({'Name':['Anna','Anna','Anna','Lisa','Lisa','Tom' ],
'Hobby':['drawing','swimming','skiing','running','singing','drawing']})
result = (df.groupby('Name')['Hobby']
.apply(lambda x:x.reset_index(drop=True))
.unstack()
.rename(columns=lambda x:f'Hobby {x+1}'))
编辑:两个爱好的结合
df_merge = (df.merge(df, on='Name')
.assign(Hobby_combi=lambda x:x.Hobby_x+'|'+x.Hobby_y)
.loc[lambda x:x.Hobby_x!=x.Hobby_y]
.groupby(['Name','Hobby_combi'])
.size()
.unstack()
.reindex(df['Name'].unique())
.fillna(0, downcast='infer'))
df_merge.sum()