我无法格式化diff()函数的时间单位输出。
代码如下:
import pandas as pd
from numpy import random
df = pd.DataFrame(data = random.randn(5,4), index = ['A','B','C','D','E'],
columns = ['W','X','Y','Z'])
df['W'] = ['10/01/2018 12:00:00','10/03/2018 13:00:00',
'10/03/2018 12:30:00','10/04/2018 12:05:00',
'10/08/2018 12:00:15']
df['W']=pd.to_datetime(df['W'])
df['delta']=df['W'].diff()
df
这就是我得到的(“增量”列):
W X Y Z delta
A 2018-10-01 0.218683 1.704266 1.035627 NaT
B 2018-10-03 -1.362903 1.251404 -0.296558 2 days 01:00:00
C 2018-10-03 1.288930 -1.692359 1.185029 -1 days +23:30:00
D 2018-10-04 1.355021 1.144945 -1.294918 0 days 23:35:00
E 2018-10-08 -0.572535 0.236500 -0.435992 3 days 23:55:15
这就是我想在“增量”列中得到的内容:
W X Y Z delta
A 2018-10-01 0.218683 1.704266 1.035627 NaT
B 2018-10-03 -1.362903 1.251404 -0.296558 2.04
C 2018-10-03 1.288930 -1.692359 1.185029 -0.02
D 2018-10-04 1.355021 1.144945 -1.294918 0.98
E 2018-10-08 -0.572535 0.236500 -0.435992 3.99
有什么想法吗?
感谢您的帮助!
答案 0 :(得分:2)
将时间增量转换为天数.Series.dt.total_seconds
,将86400
除以60 * 60 *24
,最后round
:
df['delta']=df['W'].diff().dt.total_seconds().div(86400).round(2)
print (df)
W X Y Z delta
A 2018-10-01 12:00:00 0.821455 1.481278 1.331864 NaN
B 2018-10-03 13:00:00 0.685609 0.573761 0.287728 2.04
C 2018-10-03 12:30:00 0.953490 -1.689625 -0.344943 -0.02
D 2018-10-04 12:05:00 -0.514984 0.244509 -0.189313 0.98
E 2018-10-08 12:00:15 0.464802 0.845930 -0.503542 4.00