我有一个数据框如下:
User Bought
0 U296 PC
1 U300 Table
2 U296 PC
3 U296 Chair
我想创建2个列,一个显示用户购买最多的项目,第二个显示此项目被购买的次数,所以我到最后:
User Bought Most_Bought Times_bought
0 U296 PC PC 2
1 U300 Table Table 1
2 U296 PC PC 2
3 U296 Chair PC 2
我知道我应该做类似于groupby的事情并使用mode()但是我错过了最后的触摸。
谢谢你的帮助!
答案 0 :(得分:2)
<强>更新强>
In [330]: g = df.groupby('User')['Bought']
In [331]: vc = g.value_counts().to_frame(name='Times_bought').reset_index()
In [332]: df = df.merge(vc)
In [333]: df
Out[333]:
User Bought Times_bought Most_Bought
0 U296 PC 2 PC
1 U296 PC 2 PC
2 U300 Table 1 Table
3 U296 Chair 1 PC
In [334]: df['Most_Bought'] = df['User'].map(g.agg(lambda x: x.mode()[0]))
In [335]: df
Out[335]:
User Bought Times_bought Most_Bought
0 U296 PC 2 PC
1 U296 PC 2 PC
2 U300 Table 1 Table
3 U296 Chair 1 PC
旧回答:
IIUC:
In [222]: x = df.groupby('User')['Bought'] \
...: .agg([lambda x: x.mode()[0], 'nunique']) \
...: .rename(columns={'<lambda>':'Most_Bought','nunique':'Times_bought'})
...:
In [223]: df.merge(x, left_on='User', right_index=True)
Out[223]:
User Bought Most_Bought Times_bought
0 U296 PC PC 2
2 U296 PC PC 2
3 U296 Chair PC 2
1 U300 Table Table 1
保留原始订单:
In [258]: df.merge(x, left_on='User', right_index=True).reindex(df.index)
Out[258]:
User Bought Most_Bought Times_bought
0 U296 PC PC 2
1 U300 Table Table 1
2 U296 PC PC 2
3 U296 Chair PC 2
Helper DF:
In [224]: x
Out[224]:
Most_Bought Times_bought
User
U296 PC 2
U300 Table 1
答案 1 :(得分:2)
请花很长时间让它成真:)使用value_counts
df[['Most_Bought','Times_bought']]=df.groupby('User').Bought.transform(lambda x : [pd.Series(x).value_counts()\
.reset_index().loc[0].values]).apply(pd.Series)
df
Out[231]:
User Bought Most_Bought Times_bought
0 U296 PC PC 2
1 U300 Table Table 1
2 U296 PC PC 2
3 U296 Chair PC 2