我的数据集是这样的:
null
订阅者在任何工作日(周一至周五)的平均旅行时间是多少?
函数 tripduration starttime User Type
0 732 7/1/2015 00:00:03 Subscriber
1 322 7/1/2015 00:00:06 Subscriber
2 790 7/1/2015 00:00:17 Subscriber
3 1228 7/1/2015 00:00:23 Subscriber
4 1383 7/1/2015 00:00:44 Subscriber
5 603 7/1/2015 00:01:00 Subscriber
6 520 7/1/2015 00:01:03 Subscriber
7 289 7/1/2015 00:01:06 Subscriber
8 1771 7/1/2015 00:01:25 Customer
9 813 7/1/2015 00:01:41 Subscriber
10 1735 7/1/2015 00:01:50 Customer
11 832 7/1/2015 00:01:58 Subscriber
12 1210 7/1/2015 00:02:06 Subscriber
13 746 7/1/2015 00:02:07 Subscriber
14 749 7/1/2015 00:02:26 Subscriber
15 463 7/1/2015 00:02:26 Subscriber
16 331 7/1/2015 00:02:35 Subscriber
17 951 7/1/2015 00:02:43 Customer
18 1352 7/1/2015 00:02:47 Customer
19 275 7/1/2015 00:02:47 Subscriber
20 199 7/1/2015 00:03:05 Subscriber
21 383 7/1/2015 00:03:16 Customer
22 4210 7/1/2015 00:03:27 Subscriber
23 584 7/1/2015 00:03:34 Subscriber
24 735 7/1/2015 00:03:48 Subscriber
25 827 7/1/2015 00:03:56 Subscriber
26 677 7/1/2015 00:03:57 Subscriber
27 2371 7/1/2015 00:03:58 Customer
28 666 7/1/2015 00:04:03 Subscriber
29 999 7/1/2015 00:04:17 Subscriber
... ... ... ...
1085646 243 7/31/2015 23:57:25 Subscriber
1085647 1378 7/31/2015 23:57:29 Customer
1085648 230 7/31/2015 23:57:32 Subscriber
1085649 1669 7/31/2015 23:57:33 Subscriber
1085650 493 7/31/2015 23:57:44 Subscriber
1085651 822 7/31/2015 23:57:54 Subscriber
1085652 617 7/31/2015 23:58:03 Subscriber
1085653 349 7/31/2015 23:58:08 Subscriber
1085654 818 7/31/2015 23:58:12 Customer
1085655 2062 7/31/2015 23:58:15 Subscriber
1085656 945 7/31/2015 23:58:18 Customer
1085657 346 7/31/2015 23:58:24 Subscriber
1085658 399 7/31/2015 23:58:27 Subscriber
1085659 641 7/31/2015 23:58:42 Subscriber
1085660 1872 7/31/2015 23:58:43 Subscriber
1085661 12065 7/31/2015 23:58:51 Customer
1085662 265 7/31/2015 23:58:53 Subscriber
1085663 936 7/31/2015 23:58:58 Subscriber
1085664 395 7/31/2015 23:59:04 Subscriber
1085665 238 7/31/2015 23:59:10 Subscriber
1085666 551 7/31/2015 23:59:24 Subscriber
1085667 423 7/31/2015 23:59:23 Customer
1085668 1623 7/31/2015 23:59:24 Subscriber
1085669 1632 7/31/2015 23:59:24 Subscriber
1085670 305 7/31/2015 23:59:38 Subscriber
1085671 275 7/31/2015 23:59:40 Subscriber
1085672 530 7/31/2015 23:59:41 Subscriber
1085673 273 7/31/2015 23:59:42 Customer
1085674 1273 7/31/2015 23:59:56 Subscriber
1085675 1667 7/31/2015 23:59:59 Subscriber
应该返回平均值(float到两位小数):
a4()
我被困在这里工作日(周一至周五)来计算def a4(rides):
df1 = rides[rides['User Type'] == 'Subscriber']
df1['starttime'] = df1['starttime'].apply(pd.to_datetime) #convert obect into datetime
的平均值。
我尝试使用tripduration
解析starttime
,但收到错误:
parser.parse(df1['starttime'])
获得工作日平均值的正确方法是什么?
答案 0 :(得分:2)
我认为您需要先转换to_datetime
列starttime
。
然后按boolean indexing
过滤。
如果所有workday
需要一个标量值,请loc
使用mean
选择列:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
return round(rides.loc[m, 'tripduration'].mean(), 2)
print (a4(rides))
825.33
如果需要每天单独添加dayofweek
的新条件,然后groupby
添加汇总mean
:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.dayofweek)['tripduration'].mean().round(2)
print (a4(rides))
starttime
2 840.96
4 809.71
Name: tripduration, dtype: float64
如果不需要天数,请使用weekday_name
:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.weekday_name)['tripduration'].mean().round(2)
print (a4(rides))
starttime
Friday 809.71
Wednesday 840.96
Name: tripduration, dtype: float64
答案 1 :(得分:2)
df = pd.read_csv(...., parse_dates='starttime')
使用布尔索引进行过滤,并使用groupby
dayofweek
来计算mean
。
df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)