我想获取数据帧中几个时间戳列之间的timedelta间隔。此外,几个条目是NaN。
原创DF:
0 1 2 3 4 5
0 date1 date2 NaN NaN NaN NaN
1 date3 date4 date5 date6 date7 date8
期望的输出:
0 1 2 3 4
0 date2-date1 NaN NaN NaN NaN
1 date4-date3 date5-date4 date6-date5 date7-date6 date8-date7
答案 0 :(得分:1)
我认为你可以使用连续的NaN
来结束行:
df = pd.DataFrame([['2015-01-02','2015-01-03', np.nan, np.nan],
['2015-01-02','2015-01-05','2015-01-07','2015-01-12']])
print (df)
0 1 2 3
0 2015-01-02 2015-01-03 NaN NaN
1 2015-01-02 2015-01-05 2015-01-07 2015-01-12
df = df.apply(pd.to_datetime).ffill(axis=1).diff(axis=1)
print (df)
0 1 2 3
0 NaT 1 days 0 days 0 days
1 NaT 3 days 2 days 5 days
<强>详情:
首先将所有列转换为日期时间:
print (df.apply(pd.to_datetime))
0 1 2 3
0 2015-01-02 2015-01-03 NaT NaT
1 2015-01-02 2015-01-05 2015-01-07 2015-01-12
通过向前填充每行的最后一个值来替换NaN
:
print (df.apply(pd.to_datetime).ffill(axis=1))
0 1 2 3
0 2015-01-02 2015-01-03 2015-01-03 2015-01-03
1 2015-01-02 2015-01-05 2015-01-07 2015-01-12
通过diff
获取差异:
print (df.apply(pd.to_datetime).ffill(axis=1).diff(axis=1))
0 1 2 3
0 NaT 1 days 0 days 0 days
1 NaT 3 days 2 days 5 days