我尝试转换类似下面的数据框:
index apple
1 [(red,3),(green,2)]
1 [(red,3)]
1 [(yellow,9),(red,3)]
1 [(green,2),(yellow,9)]
1 [(green,2),(yellow,9), (pink,50)]
2 [(yellow,14),(red,1)]
2 [(green,5)]
进入这个:
index apple_red apple_green apple_yellow apple_pink
1 3 2 9 50
2 1 5 14 0
请注意,元组在原始表中的每个索引都是唯一的 知道怎么做到这个吗?
由于
答案 0 :(得分:1)
您可以使用:
DataFrame
构造函数与list comprehension
stack
重新整形,reset_index
并按照drop_duplicates
按照从元组创建的列删除重复项set_index
和unstack
add_prefix
,rename_axis
和reset_index
df1 = pd.DataFrame([dict(x) for x in df['apple']], index=df.index) \
.stack() \
.astype(int) \
.reset_index(name='val') \
.drop_duplicates(['level_1','val']) \
.set_index(['index', 'level_1'])['val'] \
.unstack(fill_value=0) \
.add_prefix('apple_') \
.rename_axis(None) \
.rename_axis(None, axis=1) \
.reset_index()
print (df1)
index apple_green apple_pink apple_red apple_yellow
0 1 2 50 3 9
1 2 5 0 1 14