Question

假设我有一个包含数十万个字符串的列表lst。还要说我有一个字符串列表 strings_to_match，例如：

strings_to_match = ['foo', 'bar', 'hello']

我想在lst中找到包含strings_to_match_against 中尊重订单的所有字符串的字符串。

例如，如果lst是

[ 'foo-yes-bar', 'hello foo fine bar', 'abcdf foo,bar, hello?']

然后result应为'abcdf foo,bar, hello?'，因为该字符串包含strings_to_match中的所有字符串，并且它们以相同的顺序出现。

我有以下内容：

result = [x for x in lst if re.search(my_pattern, x)]

但我不知道如何使用my_pattern

定义strings_to_match

Answer 1

我不认为Regex是必要的：

>>> lst = [ 'foo-yes-bar', 'hello foo fine bar']
>>> strings_to_match = ['foo', 'bar', 'hello']
>>> [x for x in lst if all(s in x for s in strings_to_match)]
['hello foo fine bar']
>>>

但是，如果你想使用正则表达式，我想这可行：

[x for x in lst if all(re.search(s, x) for s in strings_to_match)]

修改

哦，好吧，因为你想尊重秩序，你可以这样做：

[x for x in lst if re.search(".*".join(map(re.escape, strings_to_match)), x)]

我的帖子虽然是针对你原来的问题。

Answer 2

回答更新的问题：您可以使用

my_pattern = ".*".join(map(re.escape, strings_to_match))

匹配给定顺序中包含strings_to_match的任何字符串。

您可以使用列表推导或使用filter()：

过滤列表

result = filter(re.compile(my_pattern).search, lst)

在这种特殊情况下，使用filter()会稍微提高效率。

查找与需要从另一个列表中排序包含的正则表达式匹配的字符串子列表

2 个答案: