Question

我想按ID分组并获得三个最频繁的城市。例如我有原始数据框

  ID    City
    1    London
    1    London
    1    New York
    1    London
    1    New York
    1    Berlin
    2    Shanghai
    2    Shanghai

我想要的结果是这样的：

ID first_frequent_city   second_frequent_city   third_frequent_city
1   London               New York               Berlin
2   Shanghai             NaN                    NaN

Answer 1

第一步是对每个City使用SeriesGroupBy.value_counts来计算ID的计数值，优点是已经对值进行了排序，然后按GroupBy.cumcount得到计数器，首先过滤{{1 }}的值以3表示，以DataFrame.pivot进行透视，更改列名，最后通过DataFrame.reset_index将loc转换为列：

ID

Answer 2

使用count作为引用进行排序的另一种方法，然后通过遍历groupby对象来重新创建数据帧：

df = (df.assign(count=df.groupby(["ID","City"])["City"].transform("count"))
        .drop_duplicates(["ID","City"])
        .sort_values(["ID","count"], ascending=False))
    
print (pd.DataFrame([i["City"].unique()[:3] for _, i in df.groupby("ID")]).fillna(np.NaN))

          0         1       2
0    London  New York  Berlin
1  Shanghai       NaN     NaN

Answer 3

有点长，实际上是两次分组，第一部分的工作原理是分组以升序对数据进行排序，第二部分使我们可以将数据分为几列：

(df
.groupby("ID")
.tail(3)
.drop_duplicates()
.groupby("ID")
.agg(",".join)
.City.str.split(",", expand=True)
.set_axis(["first_frequent_city",
           "second_frequent_city", 
           third_frequent_city"],
           axis="columns",)
)


     first_frequent_city    second_frequent_city    third_frequent_city
ID          
1      London                 New York                Berlin
2      Shanghai               None                    None

Answer 4

分别通过.count和ID获得City，然后将np.where()与.groupby()，max和{ {1}}。然后将索引设置为行，然后将行堆积到median列上的列中。

min

输出：

max

如何分组并获得三个最频繁的值？

4 个答案: