Question

我有一个数据框，其中包含一列字符串，另一列包含一列字符串。

                     0                     1
0      apples are good      [orange, banana]
1     bananas are good        [bananas, bad]
2  cucumbers are green      [cucumbers, are]
3     grapes are green  [grapes, are, green]
4     oranges are good             [oranges]
5   pineapples are big     [flowers, apples]

我希望找到所有索引，其中Column 0中的字符串与Column 1中的所有列表内容相匹配。在这种情况下，输出将如下所示：

                     0                     1
2  cucumbers are green      [cucumbers, are]
3     grapes are green  [grapes, are, green]
4     oranges are good             [oranges]

我知道我可以使用pandas.Series.str.contains，但这仅适用于单个列表，如果可能的话，我希望避免重复/循环。

Answer 1

您可以使用列表理解和布尔索引：

res = df[[all(word in x.split() for word in y) for x, y in zip(df[0], df[1])]]

print(res)

                     0                     1
2  cucumbers are green      [cucumbers, are]
3     grapes are green  [grapes, are, green]
4     oranges are good             [oranges]

检查一列是否与另一列中列表中的所有对象匹配

1 个答案: