使用timedelta
对象时,我无法重新采样Pandas时间序列数据。熊猫会愉快地计算一系列timedelta
的平均值,但在重新采样同一系列时似乎会绊倒。
# a Series of timedeltas
rng = pd.date_range('1/1/2000', periods=100, freq='D')
r = [timedelta(hours=i) for i in np.random.random(len(rng))]
ts = pd.Series(r, index=rng)
ts.mean() # fine
# DataError: No numeric types to aggregate
ts.resample('M', how='mean')
# this is better, but ..
ts.resample('M', how=pd.Series.mean) # works. Hurrah.
ts.resample('T', how=pd.Series.mean) # fail: Must produce aggregated value
将函数pd.Series.mean
直接传递给resample
可用于某些数据,但会跳闸,例如,如果抽样存储桶没有值(例如,上面的分钟T
)。我希望这就是为什么最好通过'mean'
并让熊猫做正确的事情。在这种情况下,只有'mean'
似乎没有选择合适的功能。
This answer避免了同样的问题并建议groupby
。这让我觉得更多的解决方法(?)这种方式看起来应该工作,但我缺少什么? (熊猫0.14)
答案 0 :(得分:2)
目前尚未实施,但进入0.14.1(见issue
)作为一种解决方法,你可以这样做:
In [1]: rng = pd.date_range('1/1/2000', periods=100, freq='D')
In [2]: r = [timedelta(hours=i) for i in np.random.random(len(rng))]
In [3]: ts = pd.Series(r, index=rng)
In [4]: ts
Out[4]:
2000-01-01 00:03:10.322420
2000-01-02 00:24:59.112675
2000-01-03 00:32:14.511518
2000-01-04 00:52:58.694410
2000-01-05 00:18:29.775375
2000-01-06 00:12:39.262857
2000-01-07 00:33:27.589009
2000-01-08 00:55:25.054240
2000-01-09 00:20:47.593920
2000-01-10 00:30:10.429640
2000-01-11 00:59:28.416187
2000-01-12 00:25:52.223876
2000-01-13 00:15:44.470747
2000-01-14 00:43:24.809208
2000-01-15 00:08:12.211051
...
2000-03-26 00:40:14.156113
2000-03-27 00:06:28.998191
2000-03-28 00:08:35.440506
2000-03-29 00:33:26.654861
2000-03-30 00:34:39.304583
2000-03-31 00:10:20.184603
2000-04-01 00:50:13.484530
2000-04-02 00:40:11.975429
2000-04-03 00:04:36.064879
2000-04-04 00:42:54.793764
2000-04-05 00:58:30.588331
2000-04-06 00:34:17.431583
2000-04-07 00:34:55.479245
2000-04-08 00:47:24.305921
2000-04-09 00:14:42.699607
Freq: D, Length: 100
按月分组,然后执行均值:
In [5]: ts.groupby(pd.Grouper(freq='M')).apply(lambda x: x.mean()[0])
Out[5]:
2000-1-31 00:32:13.413522
2000-2-29 00:26:06.009614
2000-3-31 00:31:57.965306
2000-4-30 00:36:25.202588
dtype: timedelta64[ns]