计算DatetimeIndex和Timestamp列之间的差异

时间:2013-11-08 16:13:54

标签: python pandas python-datetime

在以下示例中:

import datetime
import pandas

base = datetime.datenow()
rr = [base - datetime.timedelta(days=x) for x in range(23)]
ee = [base - datetime.timedelta(days=x+3) for x in range(23)]
qq = pandas.DataFrame(data=rr, index=ee, columns=['datacol'])

qq.index - qq.datacol.values

最后一行引发了一个TypeError:

In [11]: qq.index-qq.datacol.values
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-e850e726adac> in <module>()
----> 1 qq.index-qq.datacol.values

/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in __sub__(self, other)
    556             return self.shift(-other)
    557         else:  # pragma: no cover
--> 558             raise TypeError(other)
    559 
    560     def _add_delta(self, delta):

TypeError: ['2013-11-08T21:18:50.478689000-0800' '2013-11-07T21:18:50.478689000-0800'

如何在索引和列之间找到区别?

注意:来自datetime对象,但索引自动成为时间戳。

1 个答案:

答案 0 :(得分:1)

以下是一个展示您问题的示例:

In [11]: rng = pd.date_range('2012-01-01', '2012-01-06')

In [12]: df = pd.DataFrame(rng, rng + 10)

In [13]: df
Out[13]: 
                             0
2012-01-11 2012-01-01 00:00:00
2012-01-12 2012-01-02 00:00:00
2012-01-13 2012-01-03 00:00:00
2012-01-14 2012-01-04 00:00:00
2012-01-15 2012-01-05 00:00:00
2012-01-16 2012-01-06 00:00:00

您可以直接在numpy中执行(索引和列0的)差异:

In [14]: df.index.values - df[0].values
Out[14]: 
array([864000000000000, 864000000000000, 864000000000000, 864000000000000,
       864000000000000, 864000000000000], dtype='timedelta64[ns]')

并将其转换为系列:

In [15]: pd.Series(df.index.values - df[0].values)
Out[15]: 
0   10 days, 00:00:00
1   10 days, 00:00:00
2   10 days, 00:00:00
3   10 days, 00:00:00
4   10 days, 00:00:00
5   10 days, 00:00:00
dtype: timedelta64[ns]

老实说,我认为熊猫(timedeltas)的这一部分目前正在改进,所以也许在以后的版本中会有更好的方法......