在python pandas中查找平均持续时间(H:M:S)

时间:2017-10-07 12:08:35

标签: python pandas datetime mean

我试图在pandas数据帧中找到平均持续时间。我尝试了以下代码并收到错误:

TypeError: Could not convert 1:10:4200:38:5800:42:142:30:4100:19:22 to numeric

代码:

import pandas as pd

duration=['1:10:42','38:58','42:14','2:30:41','19:22']
dist=[8,5,6,17,3]
dd=list(zip(duration,dist))
df=pd.DataFrame(dd,columns=['duration','dist'])
print(df)
print('')
max_dist=df['dist'].max()
mean_dist=df['dist'].mean()
df['duration'] = df['duration'].apply(lambda x: x if len(str(x)) ==7 else '00:'+str(x)) 
print(df['duration'])
pd.to_datetime(df['duration'],format='%H:%M:%S').dt.time
max_duration=df['duration'].max()
mean_duration=df['duration'].mean()
print('')
print('max dist =',max_dist,'ave dist =',mean_dist)
print('max duration =',max_duration,'ave duration =',mean_duration)

最长持续时间返回正确的值。错误消息是否意味着日期时间格式不能用于平均值,还是有其他方式我缺少?任何帮助表示赞赏。

3 个答案:

答案 0 :(得分:3)

演示:

In [78]: s = pd.Series(['1:10:42','38:58','42:14','2:30:41','19:22'])

In [79]: s
Out[79]:
0    1:10:42
1      38:58
2      42:14
3    2:30:41
4      19:22
dtype: object

In [80]: s[s.str.match(r'^\d+\:\d+$')] = '00:' + s

In [81]: s
Out[81]:
0     1:10:42
1    00:38:58
2    00:42:14
3     2:30:41
4    00:19:22
dtype: object

In [82]: t = pd.to_timedelta(s)

In [83]: t
Out[83]:
0   01:10:42
1   00:38:58
2   00:42:14
3   02:30:41
4   00:19:22
dtype: timedelta64[ns]

In [84]: t.mean()
Out[84]: Timedelta('0 days 01:04:23.400000')

答案 1 :(得分:3)

pd.to_timedelta之后分配并找到平均值

df['duration'] = pd.to_timedelta(df['duration'])
print('max duration =',max_duration,'ave duration =',df['duration'].mean())

输出:

max duration = 02:30:41 ave duration = 0 days 01:04:23.400000

答案 2 :(得分:1)

一种方法是将duration列转换为timedelta

df['duration'] = pd.to_timedelta(df['duration'])

然后,这将不会返回任何错误

mean_duration=df['duration'].mean()