最近2秒钟的总和

时间:2018-10-05 01:23:38

标签: python pandas

考虑以下简单示例:

df = pd.DataFrame({'mytime' : [pd.to_datetime('2018-01-01 14:34:12.340'),
                             pd.to_datetime('2018-01-01 14:34:13.0'),
                             pd.to_datetime('2018-01-01 14:34:15.342'),
                             pd.to_datetime('2018-01-01 14:34:16.42'),
                             pd.to_datetime('2018-01-01 14:34:28.742')],
                    'myvalue' : [1,2,np.NaN,3,1],
                    'mychart' : ['a','b','c','d','e']})

df.set_index('mytime', inplace = True)
df
Out[142]: 
                        mychart  myvalue
mytime                                  
2018-01-01 14:34:12.340       a      1.0
2018-01-01 14:34:13.000       b      2.0
2018-01-01 14:34:15.342       c      NaN
2018-01-01 14:34:16.420       d      3.0
2018-01-01 14:34:28.742       e      1.0

在这里,我要使用rolling来计算最近2秒钟内myvalue的滚动总和。

是的,是最后两秒钟,不是最后两个观察结果:)

这应该可以,但是两个相似的调用给出不同的结果

df['myrol1'] = df.myvalue.rolling(window = '2s', closed = 'right').apply(lambda x: x.sum())
df['myrol2'] = df.myvalue.rolling(window = '2s', closed = 'right').sum()

df
Out[152]: 
                        mychart  myvalue  myrol1  myrol2
mytime                                                  
2018-01-01 14:34:12.340       a      1.0     1.0     1.0
2018-01-01 14:34:13.000       b      2.0     3.0     3.0
2018-01-01 14:34:15.342       c      NaN     NaN     NaN
2018-01-01 14:34:16.420       d      3.0     NaN     3.0
2018-01-01 14:34:28.742       e      1.0     1.0     1.0

apply在这里发生了什么?这里使用apply的任何东西似乎都是错误的。 例如:

df.mychart.rolling(window = '2s', closed = 'right').apply(lambda x: ' '.join(x))
Out[160]: 
mytime
2018-01-01 14:34:12.340    a
2018-01-01 14:34:13.000    b
2018-01-01 14:34:15.342    c
2018-01-01 14:34:16.420    d
2018-01-01 14:34:28.742    e
Name: mychart, dtype: object

谢谢!

1 个答案:

答案 0 :(得分:2)

您可能需要检查np.nansum

df.myvalue.rolling(window = '2s', closed = 'right').apply(lambda x: np.nansum(x))
Out[248]: 
mytime
2018-01-01 14:34:12.340    1.0
2018-01-01 14:34:13.000    3.0
2018-01-01 14:34:15.342    NaN
2018-01-01 14:34:16.420    3.0
2018-01-01 14:34:28.742    1.0
Name: myvalue, dtype: float64

由于您拥有NaN的原始值,并且简单的sum将返回NaN

np.sum([0.5, np.nan])
Out[249]: nan
np.nansum([0.5, np.nan])
Out[250]: 0.5