我有这个数据框:
ID key
0 1 A
1 1 B
2 2 C
3 3 D
4 3 E
5 3 E
我想创建其他key
列 - 如果有重复key
IDs
列中
这是输出的片段:
ID key key2
0 1 A B # Note: ID#1 appeared twice in the dataframe, so the key value "B"
# associated with the duplicate ID will be stored in the new column "key2"
完整输出应如下所示:
ID key key2 key3
0 1 A B NaN
1 2 C NaN NaN
2 3 D E E # The ID#3 has repeated three times. The key of
# of the second repeat "E" will be stored under the "key2" column
# and the third repeat "E" will be stored in the new column "key3"
任何建议或想法我应该如何解决这个问题?
谢谢,
答案 0 :(得分:1)
结帐groupby
和apply
。他们各自的文档是here和here。您可以unstack
(docs)创建MultiIndex的额外级别。
df.groupby('ID')['key'].apply(
lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])
).unstack(-1)
输出
key_0 key_1 key_2
ID
1 A B None
2 C None None
3 D E E
如果您希望将ID
作为列,则可以在此DataFrame上调用reset_index
。
答案 1 :(得分:1)
您可以cumcount
使用pivot_table
:
df['cols'] = 'key' + df.groupby('ID').cumcount().astype(str)
print (df.pivot_table(index='ID', columns='cols', values='key', aggfunc=''.join))
cols key0 key1 key2
ID
1 A B None
2 C None None
3 D E E