我有一个这样的数据框:
df = pd.DataFrame({'dim': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'A'},
'id': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3},
'value1': {0: nan, 1: 1.2, 2: 2.0, 3: nan, 4: 3.0},
'value2': {0: 1.0, 1: 2.0, 2: nan, 3: nan, 4: nan}})
dim id value1 value2
0 A 1 NaN 1.0
1 B 1 1.2 2.0
2 A 2 2.0 NaN
3 B 2 NaN NaN
4 A 3 3.0 NaN
我现在想要在id上聚合不同维度的值,以便满足以下条件: 如果昏暗==' A'不是没有然后从dim ==' A'中获取值。否则在dim ==' B' (如果不是无)。如果两者都是None,则选择None。
所以结果应该是:
id value1 value2
0 1 1.2 1.0
1 2 2.0 NaN
2 3 3.0 NaN
我的猜测是,我需要使用某种形式的按功能分组,但我不太确定。也许是适用的东西?
答案 0 :(得分:4)
您可以使用set_index
与unstack
和swaplevel
进行重塑,然后使用combine_first
:
df1 = df.set_index(['id','dim']).unstack().swaplevel(0,1,axis=1)
#alternative
#df1 = df.pivot('id','dim').swaplevel(0,1,axis=1)
print (df1)
dim A B A B
value1 value1 value2 value2
id
1 NaN 1.2 1.0 2.0
2 2.0 NaN NaN NaN
3 3.0 NaN NaN NaN
df2 = df1['A'].combine_first(df1['B']).reset_index()
print (df2)
id value1 value2
0 1 1.2 1.0
1 2 2.0 NaN
2 3 3.0 NaN
使用xs
进行选择MultiIndex
的类似解决方案:
df1 = df.set_index(['id','dim']).unstack()
#alternative
#df1 = df.pivot('id','dim')
print (df1)
value1 value2
dim A B A B
id
1 NaN 1.2 1.0 2.0
2 2.0 NaN NaN NaN
3 3.0 NaN NaN NaN
df2 = df1.xs('A', axis=1, level=1).combine_first(df1.xs('B', axis=1, level=1)).reset_index()
print (df2)
id value1 value2
0 1 1.2 1.0
1 2 2.0 NaN
2 3 3.0 NaN