在熊猫数据框中查找工作日组的平均值

时间:2017-09-07 08:13:37

标签: python pandas datetime group-by

我的数据集是这样的:

null

我的问题

订阅者在任何工作日(周一至周五)的平均旅行时间是多少?

我的代码

函数 tripduration starttime User Type 0 732 7/1/2015 00:00:03 Subscriber 1 322 7/1/2015 00:00:06 Subscriber 2 790 7/1/2015 00:00:17 Subscriber 3 1228 7/1/2015 00:00:23 Subscriber 4 1383 7/1/2015 00:00:44 Subscriber 5 603 7/1/2015 00:01:00 Subscriber 6 520 7/1/2015 00:01:03 Subscriber 7 289 7/1/2015 00:01:06 Subscriber 8 1771 7/1/2015 00:01:25 Customer 9 813 7/1/2015 00:01:41 Subscriber 10 1735 7/1/2015 00:01:50 Customer 11 832 7/1/2015 00:01:58 Subscriber 12 1210 7/1/2015 00:02:06 Subscriber 13 746 7/1/2015 00:02:07 Subscriber 14 749 7/1/2015 00:02:26 Subscriber 15 463 7/1/2015 00:02:26 Subscriber 16 331 7/1/2015 00:02:35 Subscriber 17 951 7/1/2015 00:02:43 Customer 18 1352 7/1/2015 00:02:47 Customer 19 275 7/1/2015 00:02:47 Subscriber 20 199 7/1/2015 00:03:05 Subscriber 21 383 7/1/2015 00:03:16 Customer 22 4210 7/1/2015 00:03:27 Subscriber 23 584 7/1/2015 00:03:34 Subscriber 24 735 7/1/2015 00:03:48 Subscriber 25 827 7/1/2015 00:03:56 Subscriber 26 677 7/1/2015 00:03:57 Subscriber 27 2371 7/1/2015 00:03:58 Customer 28 666 7/1/2015 00:04:03 Subscriber 29 999 7/1/2015 00:04:17 Subscriber ... ... ... ... 1085646 243 7/31/2015 23:57:25 Subscriber 1085647 1378 7/31/2015 23:57:29 Customer 1085648 230 7/31/2015 23:57:32 Subscriber 1085649 1669 7/31/2015 23:57:33 Subscriber 1085650 493 7/31/2015 23:57:44 Subscriber 1085651 822 7/31/2015 23:57:54 Subscriber 1085652 617 7/31/2015 23:58:03 Subscriber 1085653 349 7/31/2015 23:58:08 Subscriber 1085654 818 7/31/2015 23:58:12 Customer 1085655 2062 7/31/2015 23:58:15 Subscriber 1085656 945 7/31/2015 23:58:18 Customer 1085657 346 7/31/2015 23:58:24 Subscriber 1085658 399 7/31/2015 23:58:27 Subscriber 1085659 641 7/31/2015 23:58:42 Subscriber 1085660 1872 7/31/2015 23:58:43 Subscriber 1085661 12065 7/31/2015 23:58:51 Customer 1085662 265 7/31/2015 23:58:53 Subscriber 1085663 936 7/31/2015 23:58:58 Subscriber 1085664 395 7/31/2015 23:59:04 Subscriber 1085665 238 7/31/2015 23:59:10 Subscriber 1085666 551 7/31/2015 23:59:24 Subscriber 1085667 423 7/31/2015 23:59:23 Customer 1085668 1623 7/31/2015 23:59:24 Subscriber 1085669 1632 7/31/2015 23:59:24 Subscriber 1085670 305 7/31/2015 23:59:38 Subscriber 1085671 275 7/31/2015 23:59:40 Subscriber 1085672 530 7/31/2015 23:59:41 Subscriber 1085673 273 7/31/2015 23:59:42 Customer 1085674 1273 7/31/2015 23:59:56 Subscriber 1085675 1667 7/31/2015 23:59:59 Subscriber 应该返回平均值(float到两位小数):

a4()

我被困在这里工作日(周一至周五)来计算def a4(rides): df1 = rides[rides['User Type'] == 'Subscriber'] df1['starttime'] = df1['starttime'].apply(pd.to_datetime) #convert obect into datetime 的平均值。 我尝试使用tripduration解析starttime,但收到错误:

parser.parse(df1['starttime'])

获得工作日平均值的正确方法是什么?

2 个答案:

答案 0 :(得分:2)

我认为您需要先转换to_datetimestarttime

然后按boolean indexing过滤。

如果所有workday需要一个标量值,请loc使用mean选择列:

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
    return round(rides.loc[m, 'tripduration'].mean(), 2)

print (a4(rides))
825.33

如果需要每天单独添加dayofweek的新条件,然后groupby添加汇总mean

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
    return df1.groupby(df1['starttime'].dt.dayofweek)['tripduration'].mean().round(2)

print (a4(rides))
starttime
2    840.96
4    809.71
Name: tripduration, dtype: float64

如果不需要天数,请使用weekday_name

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
    return df1.groupby(df1['starttime'].dt.weekday_name)['tripduration'].mean().round(2)

print (a4(rides))
starttime
Friday       809.71
Wednesday    840.96
Name: tripduration, dtype: float64

答案 1 :(得分:2)

df = pd.read_csv(...., parse_dates='starttime')

使用布尔索引进行过滤,并使用groupby dayofweek来计算mean

df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]   
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)