我有一个 df 为:
Name Date Time
A 02/20/2021 12:30:06
A 02/20/2021 12:30:20
A 02/21/2021 12:30:20
A 02/22/2021 02:30:30
我试图将它们组合成一个日期时间并将当前行与前一行相减以获得以秒为单位的日期时间的差异列,例如:
Name Date Time diff
A 02/20/2021 12:30:06
A 02/20/2021 12:30:20 14 seconds
A 02/21/2021 12:30:20 86400 seconds
A 02/22/2021 02:30:30 50410 seconds
我正在尝试:
df['Datetime'] = df['Date'].astype(str)+' '+df['Time'].astype(str)
df[['diff']] = df.groupby('Name')[['Datetime', 'Result']].diff()
但它给了我 0 days 00:00:10 的输出。我无法在任何地方找到合适的解决方案。提前致谢
答案 0 :(得分:0)
首先通过 to_datetime
创建日期时间,然后通过 Series.dt.total_seconds
转换时间增量:
df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds()
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 NaN
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14.0
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400.0
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410.0
对于整数,对于缺失值的整数使用 integer na:
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().astype('Int64')
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 <NA>
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410
如果需要秒 floats
添加自定义函数 Series.map
:
df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))
f = lambda x: '' if pd.isna(x) else f'{int(x)} seconds'
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().map(f)
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14 seconds
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400 seconds
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410 seconds