Question

我是Python的新手，在使用pandas重新采样某些数据时，我遇到了一个棘手的问题。

当我想重新采样我的时间序列数据时，应用算术平均函数非常简单。

例如：

假设ts是分钟频率的时间序列数据（在pandas中，它被封装在pandas.Series对象中DatetimeIndex）。

要获得每组5分钟时段的算术平均值，只需：

ts.resample('5min', how='mean')

但是，我如何以这种方式计算几何平均数？是否有像上面这样的简单解决方案，例如：

ts.resample('5min', how='gmean')

Answer 1

只要返回标量，就可以将可调用对象（在本例中为函数）传递给how：

In [31]: from scipy.stats.mstats import gmean

In [32]: import pandas.util.testing as tm

In [33]: ts = tm.makeTimeSeries()[:10]

In [34]: ts
Out[34]:
2000-01-03    0.605
2000-01-04   -0.167
2000-01-05    0.365
2000-01-06   -0.206
2000-01-07   -1.156
2000-01-10   -0.219
2000-01-11    1.704
2000-01-12   -0.148
2000-01-13    1.169
2000-01-14    0.823
Freq: B, dtype: float64

In [35]: ts.resample('2D', how=lambda x: gmean(x).item())
Out[35]:
2000-01-03    0.605
2000-01-05    0.365
2000-01-07    0.000
2000-01-09    0.000
2000-01-11    1.704
2000-01-13    0.981
dtype: float64

请注意，您必须在此处调用item方法才能获得标量结果（因为根据值可能会得到MaskedConstant）。 pandas不会将单个元素Series视为标量。

另外，请注意包含nan s或计算几何均值的值的计算结果可能会返回一个复数值（例如，负数的第4个根;这将返回nan在numpy）。

当您调用gmean方法时，

item会将此类计算变为0。

例如，这就是2000-01-07和2000-01-09处有零的原因。

在2000-01-07 pandas第二天填写nan（请记住我们这里正在2D），因此几何平均值计算为ma.exp(ma.mean(ma.log([-1.156, nan])))。这两个值不是ma.log的“有效”输入（因此它们被屏蔽），因此ma.mean()会返回MaskedConstant属性为_data的{{1}}，因此0方法返回0.

如何重新采样产生几何平均值的时间序列？

1 个答案: