Question

srch_destination     hotel_booked        count
28                   1                   4
28                   5                   1
28                   8                   2
28                   11                  9
28                   14                  17
19                   11                  3
19                   2                   5
19                   5                   8
19                   6                   10

假设我有一个上面格式化的数据帧。这些是搜索，所以让我们说4个搜索目的地的人28预订了酒店1.我基本上想要获得一个数据框，其中包含每个搜索目的地的一行，以及相应的前3个预订。所以对于这个数据帧，我们将有两行看起来像：

srch_destination    top_hotels
28                  14 11 1
19                  6 5 2

目前，我的代码位于“c_id”是初始数据帧的位置，“a”是所需的输出。我来自R，我想知道是否有更有效的方法来进行排序和后续聚合。

import numpy as np
import pandas as pd

a = pd.DataFrame()

for ind in np.unique(c_id.srch_destination):
    nlarg = c_id[c_id.srch_destination == ind].sort_values('count', ascending = False).head(3)['hotel_booked']    
    a = a.append({'srch_destination': ind, 'top_hotels': " ".join(map(str, nlarg))}, ignore_index=True)

a.to_csv('out.csv')

Answer 1

使用nlargest根据count列获得前3名。

>>> (df.groupby('srch_destination')
       .apply(lambda group: group.nlargest(3, 'count').hotel_booked.tolist()))
srch_destination
19      [6, 5, 2]
28    [14, 11, 1]
dtype: object

在Python中高效排序和聚合数据？

1 个答案: