我有几个包含列的数据框:coupon_id
和rating
。
我想合并这些数据框,并将所有coupon_id
和rating
的一个数据框作为所有数据框中此coupon_id
的所有评级的总和。
例如。假设我有2个数据帧:
| coupon_id | rating | |:-----------|------------:| | 1 | 40 | | 2 | 60 | | 3 | 50 |
| coupon_id | rating | |:-----------|------------:| | 4 | 70 | | 2 | 80 | | 3 | 60 |
结果我想得到这个数据帧:
| coupon_id | rating | |:-----------|------------:| | 1 | 40 | | 2 | 140 | | 3 | 110 | | 4 | 70 |
对于这个问题,我使用这个代码,它可以工作,但效率很低
similar_users_ratings = pd.DataFrame(columns=['coupon_id', 'rating']) for similarUser in most_similar_users: similar_user_ratings = self.ratingData.loc[self.ratingData['patient_id'] == similarUser[0], :].copy() similar_user_ratings.loc[:, 'rating'] = similar_user_ratings.loc[:, 'rating'].apply(lambda x: int(x) * similarUser[1]) del similar_user_ratings['patient_id'] similar_users_ratings = similar_users_ratings.merge(similar_user_ratings, on='coupon_id', how='outer') similar_users_ratings['rating_y'].fillna(.0, inplace=True) similar_users_ratings['rating_x'].fillna(.0, inplace=True) similar_users_ratings['rating'] = similar_users_ratings['rating_x'] + similar_users_ratings['rating_y'] del similar_users_ratings['rating_y'] del similar_users_ratings['rating_x']
如何简化这段代码?感谢。
实际上我有几个数据帧,例如:
coupon_id rating 69 12 1 coupon_id rating 101 37 1 coupon_id rating 428 11 1 coupon_id rating 1133 11 1
所需数据集:
coupon_id rating 12 1 37 1 11 2
答案 0 :(得分:1)
<强>更新强>
In [46]: d1
Out[46]:
coupon_id rating
69 12 1
In [47]: d2
Out[47]:
coupon_id rating
101 37 1
In [48]: d3
Out[48]:
coupon_id rating
428 11 1
In [49]: d4
Out[49]:
coupon_id rating
1133 11 1
In [50]: pd.concat([d1,d2,d3,d4],ignore_index=True).groupby('coupon_id', as_index=False)['rating'].sum(
Out[50]:
coupon_id rating
0 11 2
1 12 1
2 37 1
OLD回答:
In [219]: d1.set_index('coupon_id').add(d2.set_index('coupon_id'), fill_value=0) \
.reset_index()
Out[219]:
coupon_id rating
0 1 40.0
1 2 140.0
2 3 110.0
3 4 70.0