如何返回一列中的值与另一列中的另一个值不对齐的次数?

时间:2018-12-03 22:45:49

标签: python pandas

我有一个数据框df:

>>> df
           user_id     group      landing_page    converted

          12345       control      old_page          0
          12346       treatment    new_page          1
          12347       control      new_page          1
          12345       treatment    old_page          0
          12349       treatment    old_page          1

我想返回不符合new_page的处理次数。

我已经尝试过df[(df['group' == "treatment"]) != (df['landing_page'] == 'new_page')],但是,我不断收到错误消息。

还有,有没有一种方法可以通过使用唯一用户来获得转换的平均值? 预先谢谢你。

2 个答案:

答案 0 :(得分:1)

IIUC,您正在寻找

>>> ((df['group'] == 'treatment') & (df['landing_page'] != 'new_page')).sum()
2

详细信息:

>>> df['group'] == 'treatment'
0    False
1     True
2    False
3     True
4     True
Name: group, dtype: bool
>>> 
>>> df['landing_page'] != 'new_page'
0     True
1    False
2    False
3     True
4     True
Name: landing_page, dtype: bool
>>> 
>>> (df['group'] == 'treatment') & (df['landing_page'] != 'new_page')
0    False
1    False
2    False
3     True
4     True
dtype: bool
>>>
>>> ((df['group'] == 'treatment') & (df['landing_page'] != 'new_page')).sum()
2

答案 1 :(得分:0)

问题1:

IIUC,您只需要执行以下操作:

len(df[(df.group=='treatment') & (df.landing_page != 'new_page')])

输出:

2

通常,您可以使用group获得treatmentgroupby组合的所有计数:

>>> df.groupby(['group','landing_page']).size()
group      landing_page
control    new_page        1
           old_page        1
treatment  new_page        1
           old_page        2
dtype: int64

显示只有1个treatment组和new_page组和2个treatment组和old_page

问题2:

user_id分组,并获得converted列的平均值:

df.groupby('user_id').converted.mean()

# user_id
# 12345    0
# 12346    1
# 12347    1
# 12349    1
# Name: converted, dtype: int64