通过timedelta修剪时间序列

时间:2014-01-21 08:34:21

标签: python pandas time-series timedelta

我正在尝试从大熊猫TimeSeries中删除所有“旧”值,例如所有超过1天的值(相对于最新值)。

天真地,我尝试过这样的事情:

from datetime import timedelta
def trim(series):
    return series[series.index.max() - series.index < timedelta(days=1)]

给出错误:

TypeError: ufunc 'subtract' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

显然,问题出在这个表达式上:series.index.max() - series.index

然后我发现这有效:

def trim(series):
    return series[series.index > series.index.max() - timedelta(days=1)]

有人可以解释为什么后者会起作用而前者会引起错误吗?

编辑:我使用的是pandas版本0.12.0

2 个答案:

答案 0 :(得分:2)

您可以按如下方式使用Truncating and Fancy Indexing

ts.truncate(before='Some Date')

示例:

rng = pd.date_range('1/1/2011', periods=72, freq='D')
ts = pd.Series(randn(len(rng)), index=rng)

ts.truncate(before=(ts.index.max() - dt.timedelta(days=1)).strftime('%m-%d-%Y'))

这应该在旧日期之前截断所有内容。如果需要,您还可以添加after参数以进一步减少它。

答案 1 :(得分:2)

以下是0.13中的示例(to_timedelta在0.12中不可用,所以 你必须做np.timedelta64(4,'D')

In [12]: rng = pd.date_range('1/1/2011', periods=10, freq='D')

In [13]: ts = pd.Series(randn(len(rng)), index=rng)

In [14]: ts
Out[14]: 
2011-01-01   -0.348362
2011-01-02    1.782487
2011-01-03    1.146537
2011-01-04   -0.176308
2011-01-05   -0.185240
2011-01-06    1.767135
2011-01-07    0.615911
2011-01-08    2.459799
2011-01-09    0.718081
2011-01-10   -0.520741
Freq: D, dtype: float64

In [15]: x = ts.index.to_series().max()-ts.index.to_series()

In [16]: x
Out[16]: 
2011-01-01   9 days
2011-01-02   8 days
2011-01-03   7 days
2011-01-04   6 days
2011-01-05   5 days
2011-01-06   4 days
2011-01-07   3 days
2011-01-08   2 days
2011-01-09   1 days
2011-01-10   0 days
Freq: D, dtype: timedelta64[ns]

In [17]: x[x>pd.to_timedelta('4 days')]
Out[17]: 
2011-01-01   9 days
2011-01-02   8 days
2011-01-03   7 days
2011-01-04   6 days
2011-01-05   5 days
Freq: D, dtype: timedelta64[ns]