如何使用pandas python查找单个聊天日志对话的持续时间

时间:2018-03-13 06:12:54

标签: python pandas nlp pandas-groupby

想要使用以下ID计算每个会话的持续时间是数据

ID  Ques   Time  Expected output
----------------------------------
11   Hi    11.21   1min
11   Hello 11.22

13   hey   12.11   10mins   
13   what  12.22

14   so    01.01   2mins
14   ok    01.03 

15   hru   02.00
15   hii   02.01   3mins
15   hey   02.02
----------------------------------

尝试

First_last_cover = English_Logs['Date'].agg(['min','max'])
print ("First Conversation and Last Conversation of the month", First_last_cover)

1 个答案:

答案 0 :(得分:0)

我认为需要转换time s to_timedelta,然后通过transform获得新列的差异:

df['Time'] = pd.to_timedelta(df['Time'].astype(str).str.replace('.', ':').add(':00'))

df['new'] = df.groupby('ID')['Time'].transform(lambda x: x.max() - x.min())
print (df)
   ID   Ques     Time Expected output      new
0  11     Hi 11:21:00            1min 00:01:00
1  11  Hello 11:22:00             NaN 00:01:00
2  13    hey 12:11:00          10mins 00:11:00
3  13   what 12:22:00             NaN 00:11:00
4  14     so 01:01:00           2mins 00:02:00
5  14     ok 01:03:00             NaN 00:02:00
6  15    hru 02:00:00             NaN 00:02:00
7  15    hii 02:01:00           3mins 00:02:00
8  15    hey 02:02:00             NaN 00:02:00

如果要将timedeltas转换为分钟,请添加total_seconds并除以60

df['new'] = df['new'].dt.total_seconds().div(60)
print (df)
   ID   Ques     Time Expected output   new
0  11     Hi 11:21:00            1min   1.0
1  11  Hello 11:22:00             NaN   1.0
2  13    hey 12:11:00          10mins  11.0
3  13   what 12:22:00             NaN  11.0
4  14     so 01:01:00           2mins   2.0
5  14     ok 01:03:00             NaN   2.0
6  15    hru 02:00:00             NaN   2.0
7  15    hii 02:01:00           3mins   2.0
8  15    hey 02:02:00             NaN   2.0

...或DataFrame的新agg

df1 = (df.groupby('ID')['Time']
        .agg(lambda x: x.max() - x.min())
        .dt.total_seconds()
        .div(60))

   ID  Time
0  11   1.0
1  13  11.0
2  14   2.0
3  15   2.0