我正在处理带有商品价格的pandas数据框df1
。
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
我使用
创建Minimum
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)
如何创建Most_Common_Price
?
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work
目前,我采用了多步骤方法:
for item in df1.Item.unique().tolist(): # Pseudocode
df1 = df1[df1.Price == Item] # Pseudocode
df1.Price.value_counts().max() # Pseudocode
这太过分了。必须有一种更简单的方法,理想情况是一行
如何在pandas中将groupby()。transform()转换为value_counts()?
答案 0 :(得分:5)
您可以使用groupby
+ transform
+ value_counts
+ idxmax
-
df['Most_Common_Price'] = \
df.groupby('Item').Price.transform(lambda x: x.value_counts().idxmax())
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
改进(谢谢,Vaishali!)涉及使用pd.Series.map
-
df['Item'] = df['Item'].map(df.groupby('Item')\
.Price.agg(lambda x: x.value_counts().idxmax())
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
答案 1 :(得分:4)
一种不错的方法是使用pd.Series.mode
,如果你想要最常见的元素(即模式)。
In [32]: df
Out[32]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)
In [34]: df
Out[34]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
正如@Wen所说,pd.Series.mode
可以返回pd.Series
个值,所以只需抓住第一个:
Out[67]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
6 Tea 3 3
In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0 3
1 4
dtype: int64
In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])
In [70]: df
Out[70]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 3
4 Tea 4 3 3
5 Tea 4 3 3
6 Tea 3 3 3