给出以下数据,
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 59]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
NAME TIMESTAMP VALUE
0 AAA 2019-01-01 10
1 AAA 2019-01-02 21
2 AAA 2019-02-01 30
3 AAA 2019-02-02 45
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 59
是否可以将每个组(“ NAME”)的最后一个值与前三行的平均值进行比较,所以预期的输出将类似于以下内容,
NAME TIMESTAMP VALUE RESULT
0 AAA 2019-01-01 10
1 AAA 2019-01-02 21
2 AAA 2019-02-01 30
3 AAA 2019-02-02 45 False
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 59 True
因此,对于组“ AAA”,结果为False,因为值45为“大于”前三个值(10 + 21 + 30)的平均值,而对于组“ BBB”,结果为True,因为该值59是“最低”,表示前三个值(50 + 60 + 70)的平均值。
关于。
答案 0 :(得分:2)
这应该有效:
def compare(a, b):
if a > b:
return False
elif a < b:
return True
dfx['rolling_mean'] = dfx.VALUE.rolling(3, 3).mean()
s = dfx.duplicated('NAME', keep = 'last')
dfx['RESULT'] = dfx[~s].apply(lambda x: compare(x.VALUE, x.rolling_mean), axis = 1)
答案 1 :(得分:1)
使用duplicated
s=dfx.duplicated('NAME',keep='last')
dfx['RESULT']=dfx[~s].VALUE.le(dfx[s].groupby('NAME')['VALUE'].mean().values)
dfx
NAME TIMESTAMP VALUE RESULT
0 AAA 2019-01-01 10 NaN
1 AAA 2019-01-02 21 NaN
2 AAA 2019-02-01 30 NaN
3 AAA 2019-02-02 45 False
4 BBB 2019-01-01 50 NaN
5 BBB 2019-01-02 60 NaN
6 BBB 2019-02-01 70 NaN
7 BBB 2019-02-02 59 True