我试图在pandas数据帧中找到平均持续时间。我尝试了以下代码并收到错误:
TypeError: Could not convert 1:10:4200:38:5800:42:142:30:4100:19:22 to numeric
代码:
import pandas as pd
duration=['1:10:42','38:58','42:14','2:30:41','19:22']
dist=[8,5,6,17,3]
dd=list(zip(duration,dist))
df=pd.DataFrame(dd,columns=['duration','dist'])
print(df)
print('')
max_dist=df['dist'].max()
mean_dist=df['dist'].mean()
df['duration'] = df['duration'].apply(lambda x: x if len(str(x)) ==7 else '00:'+str(x))
print(df['duration'])
pd.to_datetime(df['duration'],format='%H:%M:%S').dt.time
max_duration=df['duration'].max()
mean_duration=df['duration'].mean()
print('')
print('max dist =',max_dist,'ave dist =',mean_dist)
print('max duration =',max_duration,'ave duration =',mean_duration)
最长持续时间返回正确的值。错误消息是否意味着日期时间格式不能用于平均值,还是有其他方式我缺少?任何帮助表示赞赏。
答案 0 :(得分:3)
演示:
In [78]: s = pd.Series(['1:10:42','38:58','42:14','2:30:41','19:22'])
In [79]: s
Out[79]:
0 1:10:42
1 38:58
2 42:14
3 2:30:41
4 19:22
dtype: object
In [80]: s[s.str.match(r'^\d+\:\d+$')] = '00:' + s
In [81]: s
Out[81]:
0 1:10:42
1 00:38:58
2 00:42:14
3 2:30:41
4 00:19:22
dtype: object
In [82]: t = pd.to_timedelta(s)
In [83]: t
Out[83]:
0 01:10:42
1 00:38:58
2 00:42:14
3 02:30:41
4 00:19:22
dtype: timedelta64[ns]
In [84]: t.mean()
Out[84]: Timedelta('0 days 01:04:23.400000')
答案 1 :(得分:3)
在pd.to_timedelta
之后分配并找到平均值
df['duration'] = pd.to_timedelta(df['duration'])
print('max duration =',max_duration,'ave duration =',df['duration'].mean())
输出:
max duration = 02:30:41 ave duration = 0 days 01:04:23.400000
答案 2 :(得分:1)
一种方法是将duration
列转换为timedelta
列
df['duration'] = pd.to_timedelta(df['duration'])
然后,这将不会返回任何错误
mean_duration=df['duration'].mean()