我的数据集的格式-
| Time | Category|
=====================
| 12:37 | 'one' |
| 12:39 | 'two' |
| 12:41 | 'two' |
| 12:45 | 'one' |
| 12:46 | 'one' |
我想创建一个新列,以测量当前行与上次记录该特定标签的上次时间之间的时差,以使表格变为
| Time | Category | Since_last |
=====================================
| 12:37 | 'one' | 0 min | (0 as it is the first measurement)
| 12:39 | 'two' | 0 min |
| 12:41 | 'two' | 2 min |
| 12:45 | 'one' | 8 min |
| 12:46 | 'one' | 1 min |
我该怎么做?
答案 0 :(得分:2)
将时间序列转换为timedelta
,然后使用groupby
+ diff
:
df['Time'] = pd.to_timedelta(df['Time']+':00')
df['Diff'] = df.groupby('Category')['Time'].diff().fillna(0)
print(df)
Time Category Diff
0 12:37:00 'one' 00:00:00
1 12:39:00 'two' 00:00:00
2 12:41:00 'two' 00:02:00
3 12:45:00 'one' 00:08:00
4 12:46:00 'one' 00:01:00
如果字符串格式对您很重要:
df['Diff'] = df['Diff'].apply(lambda x: f'{int(x.seconds/60)} min')
print(df)
Time Category Diff
0 12:37:00 'one' 0 min
1 12:39:00 'two' 0 min
2 12:41:00 'two' 2 min
3 12:45:00 'one' 8 min
4 12:46:00 'one' 1 min
答案 1 :(得分:0)
转换时间
df['Time'] = pd.to_datetime(df['Time'],format= '%H:%M' ).dt.time
使用Groupby和Diff
df=pd.concat([df.Time, df.groupby('Category').Time.diff()],
axis=1, keys=['Time','Diff']).fillna(0)
转换为分钟
df['Diff']=df['Diff'].apply(lambda x: f'{int(x.seconds/60)} min')
输出
Time Category
0 12:37:00 one
1 12:39:00 two
2 12:41:00 two
3 12:45:00 one
4 12:46:00 one