TL; DR:有解决方案:
rolling_mean
(无论此窗口中是否有10个或100个或仅2个样本)更确切地说:
import pandas as pd, time
df = pd.DataFrame(columns = ['x'])
for i in range(10):
df.ix[pd.datetime.now()] = {'x': 10 + i}
time.sleep(0.2) # here 0.2 seconds between each new data...
df.ix[pd.datetime.now()] = {'x': 20}
time.sleep(1) # here 1 second...
df.ix[pd.datetime.now()] = {'x': 21}
time.sleep(3) # here 3 seconds...
df.ix[pd.datetime.now()] = {'x': 22}
为df
:
x
2016-01-08 13:57:10.679 10
2016-01-08 13:57:10.882 11
2016-01-08 13:57:11.085 12
2016-01-08 13:57:11.287 13
2016-01-08 13:57:11.489 14
2016-01-08 13:57:11.691 15
2016-01-08 13:57:11.893 16
2016-01-08 13:57:12.095 17
2016-01-08 13:57:12.297 18
2016-01-08 13:57:12.499 19
2016-01-08 13:57:12.701 20
2016-01-08 13:57:13.703 21
2016-01-08 13:57:16.706 22
和pd.rolling_mean(df, 5)
x
2016-01-08 13:57:10.679 NaN
2016-01-08 13:57:10.882 NaN
2016-01-08 13:57:11.085 NaN
2016-01-08 13:57:11.287 NaN
2016-01-08 13:57:11.489 12
2016-01-08 13:57:11.691 13
2016-01-08 13:57:11.893 14
2016-01-08 13:57:12.095 15
2016-01-08 13:57:12.297 16
2016-01-08 13:57:12.499 17
2016-01-08 13:57:12.701 18
2016-01-08 13:57:13.703 19
2016-01-08 13:57:16.706 20
当然pd.rolling_mean(df, 5)
计算5行的滚动均值,这不是我想要的:我想要5秒的时间。
一个解决方案是df.resample('1S', ...)
,但由于我想在每次添加新数据时计算新的rolling_mean
,表示我应该.resample(...)
整个DataFrame每分钟很多时间,这真的非常耗时,而且我认为它不是一个干净的解决方案。(在我的实际用例中,DataFrame很大)。
这是一个干净的解决方案吗?
答案 0 :(得分:0)
当您添加新数据时,如何在您的df中存储滚动平均值?
import datetime as dt
latest = pd.datetime.now()
five_secs = datetime.timedelta(seconds=5)
new_x=99
df.ix[latest] = {'x':new_x,
'five_second_mean':df[df.index > latest - five_secs].x.append(pd.Series(new_x).mean()}
答案 1 :(得分:0)
考虑使用series apply函数捕获特定行的最后5秒。使用此方法,您可以在所有数据完成后运行一次。只有你的设置警告你不能在索引上使用apply()
,所以使用临时时间戳列(等于索引值):
import datetime
...
# SERIES MEAN FUNCTION
def runMean(row):
ser = df.x[(df['timeval'] > row - datetime.timedelta(seconds=5)) &
(df['timeval'] <= row)]
return ser.mean()
# APPLY FUNCTION
df['timeval'] = df.index
df['last5secMean'] = df['timeval'].apply(runMean)
df = df[['x','last5secMean']]