在以下示例中:
import datetime
import pandas
base = datetime.datenow()
rr = [base - datetime.timedelta(days=x) for x in range(23)]
ee = [base - datetime.timedelta(days=x+3) for x in range(23)]
qq = pandas.DataFrame(data=rr, index=ee, columns=['datacol'])
qq.index - qq.datacol.values
最后一行引发了一个TypeError:
In [11]: qq.index-qq.datacol.values
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-e850e726adac> in <module>()
----> 1 qq.index-qq.datacol.values
/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in __sub__(self, other)
556 return self.shift(-other)
557 else: # pragma: no cover
--> 558 raise TypeError(other)
559
560 def _add_delta(self, delta):
TypeError: ['2013-11-08T21:18:50.478689000-0800' '2013-11-07T21:18:50.478689000-0800'
如何在索引和列之间找到区别?
注意:来自datetime对象,但索引自动成为时间戳。
答案 0 :(得分:1)
以下是一个展示您问题的示例:
In [11]: rng = pd.date_range('2012-01-01', '2012-01-06')
In [12]: df = pd.DataFrame(rng, rng + 10)
In [13]: df
Out[13]:
0
2012-01-11 2012-01-01 00:00:00
2012-01-12 2012-01-02 00:00:00
2012-01-13 2012-01-03 00:00:00
2012-01-14 2012-01-04 00:00:00
2012-01-15 2012-01-05 00:00:00
2012-01-16 2012-01-06 00:00:00
您可以直接在numpy中执行(索引和列0的)差异:
In [14]: df.index.values - df[0].values
Out[14]:
array([864000000000000, 864000000000000, 864000000000000, 864000000000000,
864000000000000, 864000000000000], dtype='timedelta64[ns]')
并将其转换为系列:
In [15]: pd.Series(df.index.values - df[0].values)
Out[15]:
0 10 days, 00:00:00
1 10 days, 00:00:00
2 10 days, 00:00:00
3 10 days, 00:00:00
4 10 days, 00:00:00
5 10 days, 00:00:00
dtype: timedelta64[ns]
老实说,我认为熊猫(timedeltas)的这一部分目前正在改进,所以也许在以后的版本中会有更好的方法......