Question

我有一个看起来像这样的数据框：

df=
['user_id','session_id','purchase']
[1,34,'yes']
[1,35,'no']
[2,36,'no']

现在，我想创建2个新列，以汇总每个用户的所有购买。请注意，对于同一用户，它应该将相同的值粘贴到这样的新列中：

df=
['user_id','session_id','purchase',purchase_yes','purchase_no']
[1,34,'yes',1,1]
[1,35,'no' ,1,1]
[2,36,'no' ,0,1]

我尝试了此方法，但是它不起作用：

df['purchase_yes'] = df[df.purchase == 'yes'].groupby("user_id").purchase.sum()

它显示了Nan的值。

Answer 1

尝试一下：

new_df = df.groupby('user_id').purchase.value_counts().unstack(fill_value=0)

# you can also use either of these
# new_df = pd.crosstab(df.user_id, df.purchase)
# new_df = df.pivot_table(index='user_id', columns='purchase', aggfunc='count', fill_value=0)

# rename the columns of new data
new_df.columns = 'purchase_'+new_df.columns

# merge the new data with the old on user_id
df.merge(new_df, left_on='user_id', right_index=True)

输出：

   user_id  session_id purchase  purchase_no  purchase_yes
0        1          34      yes            1             1
1        1          35       no            1             1
2        2          36       no            1             0

Answer 2

您可以使用groupby和value_counts来获得总和：

a=df.groupby(['user_id'])['purchase'].value_counts().unstack(fill_value=0)
print(a)

purchase    no  yes
user_id     
1            1    1
2            1    0

然后使用pandas.Series.map：

df['purchase_yes']=df['user_id'].map(a['yes'])
df['purchase_no']=df['user_id'].map(a['no'])

输出：

   user_id  session_id purchase  purchase_yes  purchase_no
0        1          34      yes             1           1
1        1          35       no             1           1
2        2          36       no             0           1

具有groupby条件的新列在数据框中不起作用

2 个答案: