Python比较和计数行值

时间:2018-09-20 19:21:46

标签: python pandas count compare

我想逐行比较两列,并在每行中的特定值不正确时计数。例如:

group       landing_page 
control     new_page
control     old_page
treatment   new_page
treatment   old_page
control     old_page

我想计算treatment不等于new_pagecontrol不等于old_page的次数。我想也可能是相反的,treatment等于new_page

3 个答案:

答案 0 :(得分:1)

使用pandas groupby查找组/着陆页对的计数。

再次使用groupby查找组计数。 要查找每个组中其他目标网页的数量,请减去每个 群组计数中的目标网页计数。

df = pd.DataFrame({'group': ['control', 'control', 'treatment',
                             'treatment', 'control'],
                   'landing_page': ['new_page', 'old_page', 'new_page',
                                    'old_page', 'old_page']})

# find counts per pairing
df_out = df.groupby(['group', 'landing_page'])['landing_page'].count().to_frame() \
    .rename(columns={'landing_page': 'count'}).reset_index()
# find totals for groups
df_out['grp_total'] = df_out.groupby('group')['count'].transform('sum')
# find count not equal to landing page
df_out['inverse_count'] = df_out['grp_total'] - df_out['count']

print(df_out)

       group landing_page  count  grp_total  inverse_count
0    control     new_page      1          3              2
1    control     old_page      2          3              1
2  treatment     new_page      1          2              1
3  treatment     old_page      1          2              1

答案 1 :(得分:0)

这听起来像是zip()函数的工作。

首先,设置输入和计数器:

group = ["control", "control", "treatment", "treatment", "control"]
landingPage = ["new_page", "old_page", "new_page", "old_page", "old_page"]

treatmentNotNew = 0
controlNotOld = 0

然后将要比较的两个输入压缩到一个元组的迭代器中:

zipped = zip(group, landingPage)

现在,您可以遍历元组值a(组)和b(着陆点),同时计算treatment != new_pagecontrol != old_page的每次计数:

for a, b in zipped:
    if((a == "treatment") and (not b == "new_page")):
        treatmentNotNew += 1

    if((a == "control") and (not b == "old_page")):
        controlNotOld += 1

最后,打印结果!

print("treatmentNotNew = " + str(treatmentNotNew))
print("controlNotOld = " + str(controlNotOld))

>> treatmentNotNew = 1
>> controlNotOld = 1

答案 2 :(得分:0)

我将使用map创建一个新列,该列在给定输入的情况下映射您所需的输出。然后,您可以轻松测试新的映射列是否等于Landing_page列。

df = pd.DataFrame({
    'group': ['control', 'control', 'treatment', 'treatment', 'control'],
    'landing_page': ['old_page', 'old_page', 'new_page', 'old_page', 'new_page']
})

df['mapping'] = df.group.map({'control': 'old_page', 'treatment': 'new_page'})

(df['landing_page'] != df['mapping']).sum()
# 2