Question

我有一个人的数据框，其中有许多列，每个人都喜欢城市和宠物的类型。我想找到最常见某种宠物的城市。

City             Pet
New York         Cat
Washington       Dog
Boston           Dog
New York         Cat
Atlanta          Cat
New York         Dog
Atlanta          Dog
Boston           Dog

因此，在这种情况下，纽约的猫最多，波士顿的猫最多。

如何确定较大的数据框中哪个城市的猫最多？

Answer 1

如果要按猫的数量对完整列表进行排序，可以执行以下操作：

In [38]: df.groupby('City').Pet.value_counts().unstack().sort_values(by='Cat', ascending=False)
Out[38]:
Pet         Cat  Dog
City
New York    2.0  1.0
Atlanta     1.0  1.0
Boston      NaN  2.0
Washington  NaN  1.0

如果只想要最大的一个，可以使用nlargest：

In [45]: df.groupby('City').Pet.value_counts().unstack().nlargest(1, 'Cat')
Out[45]:
Pet      Cat  Dog
City
New York 2.0  1.0

或者您可以做同样的事情，但是从一开始就专注于猫：

In [62]: df[df.Pet == 'Cat'].groupby('City').count().nlargest(1, 'Pet')
Out[62]:
         Pet
City
NewYork    2

如果您不关心实际数字，而只想要城市，则可以使用idxmax：

In [51]: df.groupby('City').Pet.value_counts().unstack().idxmax()
Out[51]:
Pet
Cat   New York
Dog     Boston

如果您想做最后一个示例所做的事情，但只关注猫，您也可以这样做：

In [60]: df[df.Pet == 'Cat'].groupby('City').count().idxmax()
Out[60]:
Pet    New York

Answer 2

尝试一下：nlargest将返回前“ n”个结果，在这种情况下，我设置了 nlargest = 1，因此返回最高结果。我将动物设为“猫”，所以你可以更改为“狗”并查看狗的搜索结果。

df2  = df[df["Pet"] ==animal].groupby('City').count().rename(columns={"Pet":animal}).nlargest(1, animal)

如果您希望搜索不区分大小写（此处为“ Cat，CAT或cat”，则为此处）

df2  = df[df["Pet"].str.lower() ==animal.lower()].groupby('City').count().rename(columns={"Pet":animal}).nlargest(3, animal)

Answer 3

我确信还有更多的Python方式。但这应该可以满足您的需求。

data = df.groupby(['pet','city']).city.count().to_frame() # getting the count of each pet at each city
data.columns = ['cnt']  # changing the name of the column
def set_max(series): # this is used to develop a new column that carries the maximum of that column
    return [max(series) for s in series]
data['maximum'] = data.transform(set_max) # add a column that carries maximum value
bm = data.apply(lambda x : x.maximum == x.cnt,axis=1)  # boolean mask that is true when the count=max
data.loc[bm].reset_index()[['pet','city']] # giving the results

Answer 4

我们使用mode

df.groupby('City').Pet.apply(lambda x : pd.Series.mode(x)[0])
City
Atlanta       Cat
Boston        Dog
NewYork       Cat
Washington    Dog
Name: Pet, dtype: object

从两个数据框列中查找最常见的对

4 个答案: