我知道,这是一个非常受欢迎的问题,并且有关平均时间和日期时间参数的话题很多。不幸的是,我坚持使用我的方法,并想在以下任务上向您寻求帮助:
由于行:
time_to_rent = user_payments[user_payments.rentComplete].groupby(['rentId','creditCardId','rentComplete'], as_index=False).agg({'createdAt': np.min, 'updatedAt': np.max})
我得到了数据框(dict格式):
time_to_rent = {'rentId': {0: 43.0, 1: 87.0, 2: 140.0, 3: 454.0, 4: 1458.0}, 'creditCardId': {0: 40, 1: 40, 2: 40, 3: 40, 4: 40}, 'rentComplete': {0: True, 1: True, 2: True, 3: True, 4: False}, 'createdAt': {0: Timestamp('2020-08-24 16:13:11.850216'), 1: Timestamp('2020-09-10 10:47:31.748628'), 2: Timestamp('2020-09-13 15:29:06.077622'), 3: Timestamp('2020-09-24 08:08:39.852348'), 4: Timestamp('2020-10-19 08:54:09.891518')}, 'updatedAt': {0: Timestamp('2020-08-24 20:26:31.805939'), 1: Timestamp('2020-09-10 20:05:18.759421'), 2: Timestamp('2020-09-13 18:38:10.044112'), 3: Timestamp('2020-09-24 08:53:22.512533'), 4: Timestamp('2020-10-19 09:24:03.982986')}, 'rent_time': {0: Timedelta('0 days 04:13:19.955723'), 1: Timedelta('0 days 09:17:47.010793'), 2: Timedelta('0 days 03:09:03.966490'), 3: Timedelta('0 days 00:44:42.660185'), 4: Timedelta('0 days 00:29:54.091468')}}
然后我要再添加一列:
time_to_rent['rent_time'] = time_to_rent['updatedAt'] - time_to_rent['createdAt']
我想通过“ creditCardId”对time_to_rent进行分组,并为“ rent_time”列取平均值。
该代码返回错误:
average_per_user = time_to_rent.groupby('creditCardId').agg({'rent_time' : np.mean})
这是返回的错误:
~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
906
907 if len(output) == 0:
--> 908 raise DataError("No numeric types to aggregate")
909
910 return self._wrap_aggregated_output(output)
DataError: No numeric types to aggregate
不确定len(output)为何等于0 ...
答案 0 :(得分:1)
如果我理解正确,则需要从False
中排除df.rentComplete
个值(并用)
结束整个过程)。在Pandas中,使用布尔值列进行过滤很简单:
average_per_user = time_to_rent[time_to_rent.rentComplete] \
.groupby('creditCardId').agg({'rent_time' : np.mean})
答案 1 :(得分:0)
尝试一下:
average_per_user = time_to_rent.groupby('creditCardId').mean()['rent_time']