计算时差熊猫数据帧

时间:2013-06-26 16:43:59

标签: python datetime time pandas

我有一个pandas数据框,其索引如下:

Index([16/May/2013:23:56:43, 16/May/2013:23:56:42, 16/May/2013:23:56:43, ..., 17/May/2013:23:54:45, 17/May/2013:23:54:45, 17/May/2013:23:54:45], dtype=object)

我已计算出以下方法中发生的时间差。

df2['tvalue'] = df2.index
df2['tvalue'] = np.datetime64(df2['tvalue'])
df2['delta'] = (df2['tvalue']-df2['tvalue'].shift()).fillna(0)

所以我得到了以下输出

    Time                      tvalue delta                                          
16/May/2013:23:56:43   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:42   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:43   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:43   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:48   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:48   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:56:48   2013-05-01 13:23:56 00:00:00  
16/May/2013:23:57:44   2013-05-01 13:23:57 00:00:01  
16/May/2013:23:57:44   2013-05-01 13:23:57 00:00:00  
16/May/2013:23:57:44   2013-05-01 13:23:57 00:00:00  

但它已计算出以年为单位的时差,日期也不同了?这里可能出现什么问题?

1 个答案:

答案 0 :(得分:2)

解析你的约会是不平凡的,我认为strptime可以做到这一点,但对我来说不起作用。你的时代之上的例子只是字符串,而不是日期时间。

In [140]: from dateutil import parser

In [130]: def parse(x):
   .....:     date, hh, mm, ss = x.split(':')
   .....:     dd, mo, yyyy = date.split('/')
   .....:     return parser.parse("%s %s %s %s:%s:%s" % (yyyy,mo,dd,hh,mm,ss))
   .....: 

In [131]: map(parse,idx)
Out[131]: 
[datetime.datetime(2013, 5, 16, 23, 56, 43),
 datetime.datetime(2013, 5, 16, 23, 56, 42),
 datetime.datetime(2013, 5, 16, 23, 56, 43),
 datetime.datetime(2013, 5, 17, 23, 54, 45),
 datetime.datetime(2013, 5, 17, 23, 54, 45),
 datetime.datetime(2013, 5, 17, 23, 54, 45)]

In [132]: pd.to_datetime(map(parse,idx))
Out[132]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
Length: 6, Freq: None, Timezone: None

In [133]: df = DataFrame(dict(time = pd.to_datetime(map(parse,idx))))

In [134]: df
Out[134]: 
                 time
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45

In [138]: df['delta'] = (df['time']-df['time'].shift()).fillna(0)

In [139]: df
Out[139]: 
                 time     delta
0 2013-05-16 23:56:43  00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43  00:00:01
3 2013-05-17 23:54:45  23:58:02
4 2013-05-17 23:54:45  00:00:00
5 2013-05-17 23:54:45  00:00:00