无法在Python timedelta对象上重新采样Pandas时间序列数据

时间:2014-06-05 23:39:48

标签: python pandas timedelta

使用timedelta对象时,我无法重新采样Pandas时间序列数据。熊猫会愉快地计算一系列timedelta的平均值,但在重新采样同一系列时似乎会绊倒。

# a Series of timedeltas
rng = pd.date_range('1/1/2000', periods=100, freq='D')
r = [timedelta(hours=i) for i in np.random.random(len(rng))]
ts = pd.Series(r, index=rng)

ts.mean()  # fine

# DataError: No numeric types to aggregate
ts.resample('M', how='mean')

# this is better, but ..
ts.resample('M', how=pd.Series.mean)   # works. Hurrah.
ts.resample('T', how=pd.Series.mean)   # fail: Must produce aggregated value

将函数pd.Series.mean直接传递给resample可用于某些数据,但会跳闸,例如,如果抽样存储桶没有值(例如,上面的分钟T)。我希望这就是为什么最好通过'mean'并让熊猫做正确的事情。在这种情况下,只有'mean'似乎没有选择合适的功能。

This answer避免了同样的问题并建议groupby。这让我觉得更多的解决方法(?)这种方式看起来应该工作,但我缺少什么? (熊猫0.14)

1 个答案:

答案 0 :(得分:2)

目前尚未实施,但进入0.14.1(见issue

作为一种解决方法,你可以这样做:

In [1]: rng = pd.date_range('1/1/2000', periods=100, freq='D')

In [2]: r = [timedelta(hours=i) for i in np.random.random(len(rng))]

In [3]: ts = pd.Series(r, index=rng)

In [4]: ts
Out[4]: 
2000-01-01   00:03:10.322420
2000-01-02   00:24:59.112675
2000-01-03   00:32:14.511518
2000-01-04   00:52:58.694410
2000-01-05   00:18:29.775375
2000-01-06   00:12:39.262857
2000-01-07   00:33:27.589009
2000-01-08   00:55:25.054240
2000-01-09   00:20:47.593920
2000-01-10   00:30:10.429640
2000-01-11   00:59:28.416187
2000-01-12   00:25:52.223876
2000-01-13   00:15:44.470747
2000-01-14   00:43:24.809208
2000-01-15   00:08:12.211051
...
2000-03-26   00:40:14.156113
2000-03-27   00:06:28.998191
2000-03-28   00:08:35.440506
2000-03-29   00:33:26.654861
2000-03-30   00:34:39.304583
2000-03-31   00:10:20.184603
2000-04-01   00:50:13.484530
2000-04-02   00:40:11.975429
2000-04-03   00:04:36.064879
2000-04-04   00:42:54.793764
2000-04-05   00:58:30.588331
2000-04-06   00:34:17.431583
2000-04-07   00:34:55.479245
2000-04-08   00:47:24.305921
2000-04-09   00:14:42.699607
Freq: D, Length: 100

按月分组,然后执行均值:

In [5]: ts.groupby(pd.Grouper(freq='M')).apply(lambda x: x.mean()[0])
Out[5]: 
2000-1-31    00:32:13.413522
2000-2-29    00:26:06.009614
2000-3-31    00:31:57.965306
2000-4-30    00:36:25.202588
dtype: timedelta64[ns]