Question

请考虑以下内容。

import pandas as pd
d=pd.DataFrame([[1,'a'],[1,'b'],[2,'c'],[2,'a'],[3,'c'],[4,'a'],[4,'c']],columns=['A','B'])

我想要A中与它们完全关联的c值（仅c且仅c）。这样的值只有一个。现在是3。我编写了以下查询，但未返回正确的结果。

d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A.to_frame()) ].A.to_frame()

我的想法是找到A中所有与“ c”相关联的值，然后从中删除与“ c”相关联的那些值。但是代码返回的只是与它们关联的“ c”值。有人可以帮我弄这个吗？谢谢。

Answer 1

最简单的想法是使用c过滤值，并且不允许在A列中重复：

s1 = d.loc[d.B.eq('c') & ~d.A.duplicated(keep=False), 'A']
print (s1)
4    3
Name: A, dtype: int64

您的解决方案通过删除.to_frame()起作用，但是更好的方法是使用loc来按掩码-evaluation order matters进行选择：

s2 = d.loc[ d.B.isin(['c'])  & ~d.A.isin(d.loc[d.B.isin(set(d.B.unique())-{'c'}), 'A']), 'A']
print (s2)
4    3
Name: A, dtype: int64

s2 = d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A) ].A
print (s2)
4    3
Name: A, dtype: int64