想要使用以下ID计算每个会话的持续时间是数据
ID Ques Time Expected output
----------------------------------
11 Hi 11.21 1min
11 Hello 11.22
13 hey 12.11 10mins
13 what 12.22
14 so 01.01 2mins
14 ok 01.03
15 hru 02.00
15 hii 02.01 3mins
15 hey 02.02
----------------------------------
尝试
First_last_cover = English_Logs['Date'].agg(['min','max'])
print ("First Conversation and Last Conversation of the month", First_last_cover)
答案 0 :(得分:0)
我认为需要转换time
s to_timedelta
,然后通过transform
获得新列的差异:
df['Time'] = pd.to_timedelta(df['Time'].astype(str).str.replace('.', ':').add(':00'))
df['new'] = df.groupby('ID')['Time'].transform(lambda x: x.max() - x.min())
print (df)
ID Ques Time Expected output new
0 11 Hi 11:21:00 1min 00:01:00
1 11 Hello 11:22:00 NaN 00:01:00
2 13 hey 12:11:00 10mins 00:11:00
3 13 what 12:22:00 NaN 00:11:00
4 14 so 01:01:00 2mins 00:02:00
5 14 ok 01:03:00 NaN 00:02:00
6 15 hru 02:00:00 NaN 00:02:00
7 15 hii 02:01:00 3mins 00:02:00
8 15 hey 02:02:00 NaN 00:02:00
如果要将timedeltas转换为分钟,请添加total_seconds
并除以60
:
df['new'] = df['new'].dt.total_seconds().div(60)
print (df)
ID Ques Time Expected output new
0 11 Hi 11:21:00 1min 1.0
1 11 Hello 11:22:00 NaN 1.0
2 13 hey 12:11:00 10mins 11.0
3 13 what 12:22:00 NaN 11.0
4 14 so 01:01:00 2mins 2.0
5 14 ok 01:03:00 NaN 2.0
6 15 hru 02:00:00 NaN 2.0
7 15 hii 02:01:00 3mins 2.0
8 15 hey 02:02:00 NaN 2.0
...或DataFrame
的新agg
:
df1 = (df.groupby('ID')['Time']
.agg(lambda x: x.max() - x.min())
.dt.total_seconds()
.div(60))
ID Time
0 11 1.0
1 13 11.0
2 14 2.0
3 15 2.0