我有一个数据框,我试图查找两列不匹配的行。
例如,column:landing_page
可以等于new_page
或old_page
,而column: group
可以等于control
或treatment
。目前,我使用
no_line_up = df.query('group = treatment and landing_page = old_page or group = control and landing_page = new_page')
我正在尝试查找new_page
和treatment
不匹配的行。
但是会抛出错误。这样做的正确方法是什么?
答案 0 :(得分:1)
对于pd.DataFrame.query
,您仍然需要使用相同的基本运算符,例如使用==
测试是否相等,并使用括号分隔条件:
df = pd.DataFrame({'group': ['treatment', 'control', 'hello'],
'landing_page': ['old_page', 'new_page', 'test']})
res = df.query('(group == "treatment" and landing_page == "old_page") \
or (group == "control" and landing_page == "new_page")')
print(res)
group landing_page
0 treatment old_page
1 control new_page
更具可读性的是结合布尔掩码并使用pd.DataFrame.loc
:
m1 = (df['group'] == 'treatment') & (df['landing_page'] == 'old_page')
m2 = (df['group'] == 'control') & (df['landing_page'] == 'new_page')
res = df.loc[m1 & m2]
答案 1 :(得分:0)
也许是
df.loc[((df['group']==df['treatment'])|(df['landing_page']==df['old_page']))&((df['group']==df['control'])|(df['landing_page']==df['new_page']))]