我有一个数据框df:
>>> df
user_id group landing_page converted
12345 control old_page 0
12346 treatment new_page 1
12347 control new_page 1
12345 treatment old_page 0
12349 treatment old_page 1
我想返回不符合new_page的处理次数。
我已经尝试过df[(df['group' == "treatment"]) != (df['landing_page'] == 'new_page')]
,但是,我不断收到错误消息。
还有,有没有一种方法可以通过使用唯一用户来获得转换的平均值? 预先谢谢你。
答案 0 :(得分:1)
IIUC,您正在寻找
>>> ((df['group'] == 'treatment') & (df['landing_page'] != 'new_page')).sum()
2
详细信息:
>>> df['group'] == 'treatment'
0 False
1 True
2 False
3 True
4 True
Name: group, dtype: bool
>>>
>>> df['landing_page'] != 'new_page'
0 True
1 False
2 False
3 True
4 True
Name: landing_page, dtype: bool
>>>
>>> (df['group'] == 'treatment') & (df['landing_page'] != 'new_page')
0 False
1 False
2 False
3 True
4 True
dtype: bool
>>>
>>> ((df['group'] == 'treatment') & (df['landing_page'] != 'new_page')).sum()
2
答案 1 :(得分:0)
IIUC,您只需要执行以下操作:
len(df[(df.group=='treatment') & (df.landing_page != 'new_page')])
输出:
2
通常,您可以使用group
获得treatment
和groupby
组合的所有计数:
>>> df.groupby(['group','landing_page']).size()
group landing_page
control new_page 1
old_page 1
treatment new_page 1
old_page 2
dtype: int64
显示只有1个treatment
组和new_page
组和2个treatment
组和old_page
组
按user_id
分组,并获得converted
列的平均值:
df.groupby('user_id').converted.mean()
# user_id
# 12345 0
# 12346 1
# 12347 1
# 12349 1
# Name: converted, dtype: int64