Pandas:用派生值替换相似的值

时间:2018-04-09 23:16:41

标签: pandas

如何对邻近值进行分组(在某个阈值内)并将其替换为聚合(例如,均值,最大值等)。例如,请考虑以下数据:




  cat1 cat2 value new_value
 A a 1523314515 1523314515
 A b 1523318114 1523318114
 A c 1523318115 1523318114
 B a 1523314604 1523314603
 B b 1523314605 1523314603
 B c 1523314603 1523314603
 B d 1523331024 1523331024
 C a 1523313948 1523313948
 C b 1523314790 1523314790
 D a 1523313952 1523313952& #xA; D b 1523314815 1523314815
 E a 1523529294 1523529292
 E b 1523529295 1523529292
 E c 1523529292 1523529292
 E d 1523529297 1523529292
  

 


cat1 定义的组中,如果值在10范围内,则新值应该是该群集的最小值。

&#xA ;

1 个答案:

答案 0 :(得分:0)

如果我理解正确,这是使用np.where的解决方案。第2行与预期输出的结果不同,所以我认为我没有准确地捕捉您的描述 - 或者df.loc[2, 'new_value']应该是1523318115而不是1523318114

cat2min = df.groupby('cat1')['value'].min()
mins = df['cat1'].map(cat2min)
df['new_value_calc'] = np.where(np.abs(df['value'] - mins) <= 10,
                                mins, 
                                df['value'])
df
   cat1 cat2       value   new_value  new_value_calc
0     A    a  1523314515  1523314515      1523314515
1     A    b  1523318114  1523318114      1523318114
2     A    c  1523318115  1523318114      1523318115
3     B    a  1523314604  1523314603      1523314603
4     B    b  1523314605  1523314603      1523314603
5     B    c  1523314603  1523314603      1523314603
6     B    d  1523331024  1523331024      1523331024
7     C    a  1523313948  1523313948      1523313948
8     C    b  1523314790  1523314790      1523314790
9     D    a  1523313952  1523313952      1523313952
10    D    b  1523314815  1523314815      1523314815
11    E    a  1523529294  1523529292      1523529292
12    E    b  1523529295  1523529292      1523529292
13    E    c  1523529292  1523529292      1523529292
14    E    d  1523529297  1523529292      1523529292