我有一个数据框:
df => A B C
3.8314 60.6247 -1
3.8167 60.6247 -2
3.7524 60.6247 -1
3.7407 60.6247 -1
3.6939 60.7713 -1
3.8899 60.7957 -2
3.8723 60.7957 -3
3.7144 60.7957 -1
3.7904 62.4083 -7
3.7758 62.4083 -1
3.6676 62.4083 -6
3.6588 62.4083 -6
3.6471 62.4083 -5
3.5828 62.6771 -6
3.5681 62.6771 -1
3.5272 62.6771 -7
3.5418 62.7015 -1
3.6383 62.9458 -7
4.0010 63.3856 -2
3.6997 63.3856 -2
3.6822 63.3856 -2
4.0185 63.4101 -2
3.7027 63.9231 -2
3.6851 63.9231 -3
3.5535 63.9231 -3
3.5389 63.9231 -3
如果B的值在两行或更多行的+/- 0.03范围内,并且这些行的A值落在彼此的+/- 0.026之内,我想取这些行的平均值来给出以下数据框:
df => A B C
3.82405 60.6247 -1.5
3.74655 60.6247 -1
3.84090 60.7835 -1.5
3.79335 60.7957 -2
3.7831 62.4083 -4
3.65783 62.4083 -8.5
3.57545 62.6771 -3.5
3.5345 62.6771 -4
3.6383 62.9458 -7
4.00975 63.39785 -2
3.69095 63.3856 -2
3.6939 63.9231 -2.5
3.5462 63.9231 -3
关于如何做到这一点的任何想法?
答案 0 :(得分:3)
试试这个。
df.groupby(((df.A.diff().abs().lt(0.026))&(df.B.diff().abs().lt(0.03))==False).cumsum()).mean()
Out[642]:
A B C
1 3.824050 60.6247 -1.500000
2 3.746550 60.6247 -1.000000
3 3.693900 60.7713 -1.000000
4 3.881100 60.7957 -2.500000
5 3.714400 60.7957 -1.000000
6 3.783100 62.4083 -4.000000
7 3.657833 62.4083 -5.666667
8 3.575450 62.6771 -3.500000
9 3.534500 62.6893 -4.000000
10 3.638300 62.9458 -7.000000
11 4.001000 63.3856 -2.000000
12 3.690950 63.3856 -2.000000
13 4.018500 63.4101 -2.000000
14 3.693900 63.9231 -2.500000
15 3.546200 63.9231 -3.000000