我有两个列表,其中包含字符串格式的术语。这些术语分为两类:水果和车辆。我正在尝试显示仅包含来自冲突类别的术语对的数据框。这样做的最佳方法是什么?以下是我的列表和数据框的示例。任何帮助将不胜感激!
dataframe:
col 1
['apple', 'truck' ]
['truck', 'orange']
['pear', 'motorcycle']
['pear', 'orange' ]
['apple', 'pear' ]
['truck', 'car' ]
vehicles = ['car', 'truck', 'motorcycle']
fruits = ['apple', 'orange', 'pear']
desired output:
col 2
['apple', 'truck' ]
['pear', 'motorcycle']
['truck', 'orange']
答案 0 :(得分:5)
从列表列创建DataFrame
,通过DataFrame.isin
测试成员资格,然后通过~
反转掩码,使用DataFrame.any
检查每行至少一个True
列表和最后一个链条条件都按位与-&
并按boolean indexing
进行过滤:
df1 = pd.DataFrame(df['col 1'].values.tolist())
df = df[(~df1.isin(vehicles)).any(axis=1) & (~df1.isin(fruits)).any(axis=1)]
print (df)
col 1
0 [apple, truck]
1 [truck, orange]
2 [pear, motorcycle]
另一种解决方案,其中set
由and
链接(由于标量)而交集并转换为bool
-空集将转换为False
:
def func(x):
s = set(x)
v = set(vehicles)
f = set(fruits)
return bool((s & v) and (s & f))
df = df[df['col 1'].apply(func)]
print (df)
col 1
0 [apple, truck]
1 [truck, orange]
2 [pear, motorcycle]
答案 1 :(得分:0)
np.isin
可能对您有用!
super_set = np.array([vehicles,fruits])
def f(x):
return all(np.isin(super_set,x).sum(axis=1))
df[df.col1.apply(f)]
#
col1
0 [apple, truck]
1 [truck, orange]
2 [pear, motorcycle]