我正在尝试从大熊猫TimeSeries中删除所有“旧”值,例如所有超过1天的值(相对于最新值)。
天真地,我尝试过这样的事情:
from datetime import timedelta
def trim(series):
return series[series.index.max() - series.index < timedelta(days=1)]
给出错误:
TypeError: ufunc 'subtract' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'
显然,问题出在这个表达式上:series.index.max() - series.index
然后我发现这有效:
def trim(series):
return series[series.index > series.index.max() - timedelta(days=1)]
有人可以解释为什么后者会起作用而前者会引起错误吗?
编辑:我使用的是pandas版本0.12.0
答案 0 :(得分:2)
您可以按如下方式使用Truncating and Fancy Indexing:
ts.truncate(before='Some Date')
示例:
rng = pd.date_range('1/1/2011', periods=72, freq='D')
ts = pd.Series(randn(len(rng)), index=rng)
ts.truncate(before=(ts.index.max() - dt.timedelta(days=1)).strftime('%m-%d-%Y'))
这应该在旧日期之前截断所有内容。如果需要,您还可以添加after
参数以进一步减少它。
答案 1 :(得分:2)
以下是0.13中的示例(to_timedelta
在0.12中不可用,所以
你必须做np.timedelta64(4,'D')
)
In [12]: rng = pd.date_range('1/1/2011', periods=10, freq='D')
In [13]: ts = pd.Series(randn(len(rng)), index=rng)
In [14]: ts
Out[14]:
2011-01-01 -0.348362
2011-01-02 1.782487
2011-01-03 1.146537
2011-01-04 -0.176308
2011-01-05 -0.185240
2011-01-06 1.767135
2011-01-07 0.615911
2011-01-08 2.459799
2011-01-09 0.718081
2011-01-10 -0.520741
Freq: D, dtype: float64
In [15]: x = ts.index.to_series().max()-ts.index.to_series()
In [16]: x
Out[16]:
2011-01-01 9 days
2011-01-02 8 days
2011-01-03 7 days
2011-01-04 6 days
2011-01-05 5 days
2011-01-06 4 days
2011-01-07 3 days
2011-01-08 2 days
2011-01-09 1 days
2011-01-10 0 days
Freq: D, dtype: timedelta64[ns]
In [17]: x[x>pd.to_timedelta('4 days')]
Out[17]:
2011-01-01 9 days
2011-01-02 8 days
2011-01-03 7 days
2011-01-04 6 days
2011-01-05 5 days
Freq: D, dtype: timedelta64[ns]