如何在pandas中将groupby()。transform()转换为value_counts()?

时间:2017-12-20 04:25:15

标签: python pandas dataframe group-by pandas-groupby

我正在处理带有商品价格的pandas数据框df1

  Item    Price  Minimum Most_Common_Price
0 Coffee  1      1       2
1 Coffee  2      1       2
2 Coffee  2      1       2
3 Tea     3      3       4
4 Tea     4      3       4
5 Tea     4      3       4

我使用

创建Minimum
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)

如何创建Most_Common_Price

df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work

目前,我采用了多步骤方法:

for item in df1.Item.unique().tolist(): # Pseudocode
 df1 = df1[df1.Price == Item]           # Pseudocode
 df1.Price.value_counts().max()         # Pseudocode

这太过分了。必须有一种更简单的方法,理想情况是一行

如何在pandas中将groupby()。transform()转换为value_counts()?

2 个答案:

答案 0 :(得分:5)

您可以使用groupby + transform + value_counts + idxmax -

df['Most_Common_Price'] = \
      df.groupby('Item').Price.transform(lambda x: x.value_counts().idxmax())

df

     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

改进(谢谢,Vaishali!)涉及使用pd.Series.map -

df['Item'] = df['Item'].map(df.groupby('Item')\
       .Price.agg(lambda x: x.value_counts().idxmax())
df

     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

答案 1 :(得分:4)

一种不错的方法是使用pd.Series.mode,如果你想要最常见的元素(即模式)。

In [32]: df
Out[32]:
     Item  Price  Minimum
0  Coffee      1        1
1  Coffee      2        1
2  Coffee      2        1
3     Tea      3        3
4     Tea      4        3
5     Tea      4        3

In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)

In [34]: df
Out[34]:
     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

正如@Wen所说,pd.Series.mode可以返回pd.Series个值,所以只需抓住第一个:

Out[67]:
     Item  Price  Minimum
0  Coffee      1        1
1  Coffee      2        1
2  Coffee      2        1
3     Tea      3        3
4     Tea      4        3
5     Tea      4        3
6     Tea      3        3

In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0    3
1    4
dtype: int64

In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])

In [70]: df
Out[70]:
     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  3
4     Tea      4        3                  3
5     Tea      4        3                  3
6     Tea      3        3                  3