说我有
mylist = ["test", "new"]
df = pd.DataFrame([[["test", "whatever"]], [["tes", "test_in"]], [["new2", "new1"]]], columns=["a"])
df
a
0 [test, whatever]
1 [tes, test_in]
2 [new2, new1]
我想过滤并仅获取mylist中至少具有一个值的行:
a
0 [test, whatever]
我不能做:
df.query("a.str.contains('|'.join(@mylist))", engine='python')
因为然后我得到了部分比赛。
我在想类似的东西:
df[df.apply(lambda x: set(x['a']) & set(mylist), axis=1)]
但这不起作用。
答案 0 :(得分:2)
重新创建列表列后,请使用isin
进行确认
df[pd.DataFrame(df.a.tolist()).isin(mylist).any(1)]
Out[23]:
a
0 [test, whatever]
答案 1 :(得分:2)
您接近了,只将空集转换为False
,否则将True
转换为bool:
df = df[df['a'].apply(lambda x: bool(set(x) & set(mylist)))]
print (df)
a
0 [test, whatever]
替代:
df = df[[bool(set(x) & set(mylist)) for x in df['a']]]
或者:
df = df[[bool(set(x).intersection(mylist)) for x in df['a']]]
答案 2 :(得分:1)
这对我有用:
mylist = ["test", "new"]
df = pd.DataFrame([[["test", "whatever"]], [["tes", "test_in"]], [["new2", "new1"]]], columns=["a"])
print(df)
def func(x):
for e in x[0]:
if(e in mylist):
return True
else:
continue
return False
df = df.loc[df.apply(lambda x: func(x), axis=1), :]
print(df)