我想逐行比较两列,并在每行中的特定值不正确时计数。例如:
group landing_page
control new_page
control old_page
treatment new_page
treatment old_page
control old_page
我想计算treatment
不等于new_page
或control
不等于old_page
的次数。我想也可能是相反的,treatment
等于new_page
。
答案 0 :(得分:1)
使用pandas groupby查找组/着陆页对的计数。
再次使用groupby查找组计数。 要查找每个组中其他目标网页的数量,请减去每个 群组计数中的目标网页计数。
df = pd.DataFrame({'group': ['control', 'control', 'treatment',
'treatment', 'control'],
'landing_page': ['new_page', 'old_page', 'new_page',
'old_page', 'old_page']})
# find counts per pairing
df_out = df.groupby(['group', 'landing_page'])['landing_page'].count().to_frame() \
.rename(columns={'landing_page': 'count'}).reset_index()
# find totals for groups
df_out['grp_total'] = df_out.groupby('group')['count'].transform('sum')
# find count not equal to landing page
df_out['inverse_count'] = df_out['grp_total'] - df_out['count']
print(df_out)
group landing_page count grp_total inverse_count
0 control new_page 1 3 2
1 control old_page 2 3 1
2 treatment new_page 1 2 1
3 treatment old_page 1 2 1
答案 1 :(得分:0)
这听起来像是zip()函数的工作。
首先,设置输入和计数器:
group = ["control", "control", "treatment", "treatment", "control"]
landingPage = ["new_page", "old_page", "new_page", "old_page", "old_page"]
treatmentNotNew = 0
controlNotOld = 0
然后将要比较的两个输入压缩到一个元组的迭代器中:
zipped = zip(group, landingPage)
现在,您可以遍历元组值a(组)和b(着陆点),同时计算treatment != new_page
和control != old_page
的每次计数:
for a, b in zipped:
if((a == "treatment") and (not b == "new_page")):
treatmentNotNew += 1
if((a == "control") and (not b == "old_page")):
controlNotOld += 1
最后,打印结果!
print("treatmentNotNew = " + str(treatmentNotNew))
print("controlNotOld = " + str(controlNotOld))
>> treatmentNotNew = 1
>> controlNotOld = 1
答案 2 :(得分:0)
我将使用map
创建一个新列,该列在给定输入的情况下映射您所需的输出。然后,您可以轻松测试新的映射列是否等于Landing_page列。
df = pd.DataFrame({
'group': ['control', 'control', 'treatment', 'treatment', 'control'],
'landing_page': ['old_page', 'old_page', 'new_page', 'old_page', 'new_page']
})
df['mapping'] = df.group.map({'control': 'old_page', 'treatment': 'new_page'})
(df['landing_page'] != df['mapping']).sum()
# 2