diff()函数输出-时间序列格式

时间:2019-02-05 13:26:44

标签: python-3.x pandas

我无法格式化diff()函数的时间单位输出。

代码如下:

import pandas as pd
from numpy import random

df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])

df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
           '10/03/2018 12:30:00','10/04/2018 12:05:00',
           '10/08/2018 12:00:15']

df['W']=pd.to_datetime(df['W'])
df['delta']=df['W'].diff()
df

这就是我得到的(“增量”列):

    W           X           Y           Z           delta
A   2018-10-01  0.218683    1.704266    1.035627    NaT
B   2018-10-03  -1.362903   1.251404    -0.296558   2 days 01:00:00
C   2018-10-03  1.288930    -1.692359   1.185029    -1 days +23:30:00
D   2018-10-04  1.355021    1.144945    -1.294918   0 days 23:35:00
E   2018-10-08  -0.572535   0.236500    -0.435992   3 days 23:55:15

这就是我想在“增量”列中得到的内容:

    W           X           Y           Z           delta
A   2018-10-01  0.218683    1.704266    1.035627    NaT
B   2018-10-03  -1.362903   1.251404    -0.296558   2.04
C   2018-10-03  1.288930    -1.692359   1.185029    -0.02
D   2018-10-04  1.355021    1.144945    -1.294918   0.98
E   2018-10-08  -0.572535   0.236500    -0.435992   3.99

有什么想法吗?

感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

将时间增量转换为天数.Series.dt.total_seconds,将86400除以60 * 60 *24,最后round

df['delta']=df['W'].diff().dt.total_seconds().div(86400).round(2)
print (df)
                    W         X         Y         Z  delta
A 2018-10-01 12:00:00  0.821455  1.481278  1.331864    NaN
B 2018-10-03 13:00:00  0.685609  0.573761  0.287728   2.04
C 2018-10-03 12:30:00  0.953490 -1.689625 -0.344943  -0.02
D 2018-10-04 12:05:00 -0.514984  0.244509 -0.189313   0.98
E 2018-10-08 12:00:15  0.464802  0.845930 -0.503542   4.00