Question

我正试图通过使用熊猫的国家获得一组最受欢迎的名字。我已经在片段中看到了一半，但我不清楚如何将groupedByCountry转换为已排序的表。

import math
import pandas
csv = pandas.read_csv("./name_country.csv.gz", compression="gzip")

data = csv[["name",'country']]

filtered = roleIni[data.country.notnull()]

groupedByCountry = filtered.groupby("country")

Answer 1

您可以使用groupby size然后使用nlargest：

In [11]: df = pd.DataFrame([["andy", "GB"], ["bob", "US"], ["chris", "GB"]], columns=["name", "country"])

In [12]: df.groupby("country").size().nlargest(1)
Out[12]:
country
GB    2
dtype: int64

然而，对列进行直接value_counts可能会更有效率，然后采用head（head(n)将获得前n个最受欢迎的国家/地区）：

In [21]: df["country"].value_counts().head(1)
Out[21]:
GB    2
Name: country, dtype: int64

熊猫 - 计数groupBy结果

1 个答案: