Python Pandas:聚合行条件值选择

时间:2017-09-11 10:07:41

标签: python pandas numpy python-3.5

我有一个这样的数据框:

df = pd.DataFrame({'dim': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'A'},
                   'id': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3},
                   'value1': {0: nan, 1: 1.2, 2: 2.0, 3: nan, 4: 3.0},
                   'value2': {0: 1.0, 1: 2.0, 2: nan, 3: nan, 4: nan}})

  dim  id  value1  value2
0   A   1     NaN     1.0
1   B   1     1.2     2.0
2   A   2     2.0     NaN
3   B   2     NaN     NaN
4   A   3     3.0     NaN

我现在想要在id上聚合不同维度的值,以便满足以下条件: 如果昏暗==' A'不是没有然后从dim ==' A'中获取值。否则在dim ==' B' (如果不是无)。如果两者都是None,则选择None。

所以结果应该是:

   id  value1  value2
0   1     1.2     1.0
1   2     2.0     NaN
2   3     3.0     NaN

我的猜测是,我需要使用某种形式的按功能分组,但我不太确定。也许是适用的东西?

1 个答案:

答案 0 :(得分:4)

您可以使用set_indexunstackswaplevel进行重塑,然后使用combine_first

df1 = df.set_index(['id','dim']).unstack().swaplevel(0,1,axis=1)
#alternative
#df1 = df.pivot('id','dim').swaplevel(0,1,axis=1)
print (df1)
dim      A      B      A      B
    value1 value1 value2 value2
id                             
1      NaN    1.2    1.0    2.0
2      2.0    NaN    NaN    NaN
3      3.0    NaN    NaN    NaN

df2 = df1['A'].combine_first(df1['B']).reset_index()
print (df2)
   id  value1  value2
0   1     1.2     1.0
1   2     2.0     NaN
2   3     3.0     NaN

使用xs进行选择MultiIndex的类似解决方案:

df1 = df.set_index(['id','dim']).unstack()
#alternative
#df1 = df.pivot('id','dim')
print (df1)
    value1      value2     
dim      A    B      A    B
id                         
1      NaN  1.2    1.0  2.0
2      2.0  NaN    NaN  NaN
3      3.0  NaN    NaN  NaN

df2 = df1.xs('A', axis=1, level=1).combine_first(df1.xs('B', axis=1, level=1)).reset_index()
print (df2)
   id  value1  value2
0   1     1.2     1.0
1   2     2.0     NaN
2   3     3.0     NaN