如何对邻近值进行分组(在某个阈值内)并将其替换为聚合(例如,均值,最大值等)。例如,请考虑以下数据:


 cat1 cat2 value new_value
 A a 1523314515 1523314515
 A b 1523318114 1523318114
 A c 1523318115 1523318114
 B a 1523314604 1523314603
 B b 1523314605 1523314603
 B c 1523314603 1523314603
 B d 1523331024 1523331024
 C a 1523313948 1523313948
 C b 1523314790 1523314790
 D a 1523313952 1523313952& #xA; D b 1523314815 1523314815
 E a 1523529294 1523529292
 E b 1523529295 1523529292
 E c 1523529292 1523529292
 E d 1523529297 1523529292


 
 在 cat1
定义的组中,如果值在10范围内,则新值应该是该群集的最小值。
答案 0 :(得分:0)
如果我理解正确,这是使用np.where
的解决方案。第2行与预期输出的结果不同,所以我认为我没有准确地捕捉您的描述 - 或者df.loc[2, 'new_value']
应该是1523318115
而不是1523318114
。
cat2min = df.groupby('cat1')['value'].min()
mins = df['cat1'].map(cat2min)
df['new_value_calc'] = np.where(np.abs(df['value'] - mins) <= 10,
mins,
df['value'])
df
cat1 cat2 value new_value new_value_calc
0 A a 1523314515 1523314515 1523314515
1 A b 1523318114 1523318114 1523318114
2 A c 1523318115 1523318114 1523318115
3 B a 1523314604 1523314603 1523314603
4 B b 1523314605 1523314603 1523314603
5 B c 1523314603 1523314603 1523314603
6 B d 1523331024 1523331024 1523331024
7 C a 1523313948 1523313948 1523313948
8 C b 1523314790 1523314790 1523314790
9 D a 1523313952 1523313952 1523313952
10 D b 1523314815 1523314815 1523314815
11 E a 1523529294 1523529292 1523529292
12 E b 1523529295 1523529292 1523529292
13 E c 1523529292 1523529292 1523529292
14 E d 1523529297 1523529292 1523529292