Question

我有这个df：

pd.DataFrame([[1, "type_1"], [2, "type_2"], [2, "type_1; type_2"], [2, "type_1; type_3"], [2, "type_3"], [2, "type_1; type_2, type_3"]],
                     columns=["a", "b"])
    a   b
0   1   type_1
1   2   type_2
2   2   type_1; type_2
3   2   type_1; type_3
4   2   type_3
5   2   type_1; type_2, type_3

并且我需要使用许多从配置文件中获得的查询字符串，如下所示：

my_list = ["type_1", "type_2"]
df.query("a == 2 and b in @my_list")

现在输出：

    a   b
1   2   type_2

但是我希望输出像这样，因为b中至少有一个值在my_list中：

    a   b
0   2   type_2
1   2   type_1; type_2
2   2   type_1; type_3
3   2   type_1; type_2, type_3

您看到的问题是我的某些列实际上是列表。目前它们是由;分隔的字符串，但我可以将它们转换为列表。但是，我不确定这将如何帮助我从column b内部的my_list中过滤至少具有一个值的行仅使用.query（）（因为否则，我会必须解析查询字符串，它会变得凌乱）

这将是与列表等效的代码：

pd.DataFrame([[1, ["type_1"]], [2, ["type_2"]], [2, ["type_1", "type_2"]], [2, ["type_1", "type_3"]], [2, "type_3"], [2, ["type_1", "type_2", "type_3"]]],
                     columns=["a", "b"])

Answer 1

实际上，我错了。看起来受到“ python”引擎的支持。

df.query("a == 2 and b.str.contains('|'.join(@my_list))", engine='python')

   a                       b
1  2                  type_2
2  2          type_1; type_2
3  2          type_1; type_3
5  2  type_1; type_2, type_3

（旧答案）您的查询可以分为两部分：需要子字符串检查的部分以及其他所有内容。

您可以分别计算两个掩码。我建议使用str.contains和DataFrame.eval。然后，您可以对掩码进行“与”运算并过滤df。

m1 = df.eval("a == 2")
m2 = df['b'].str.contains('|'.join(my_list))

df[m1 & m2]

   a                       b
1  2                  type_2
2  2          type_1; type_2
3  2          type_1; type_3
5  2  type_1; type_2, type_3

Answer 2

您可以使用str.split来重新创建列之前的列表，并使用isin和any。注意isin是完全匹配的，这表示您是否{ 1}}，使用type_11将返回isin

False

使用熊猫中的查询功能返回两个列表的交点处的行

2 个答案: