Question

sentence = 'Alice was not a bit hurt, and she jumped up on to her feet in a moment.'
words = ['Alice','jumped','played']

为了匹配sentence中的words，我使用了last post

答案中的代码

[w for w in words if re.search(r'\b{}\b'.format(re.escape(w)), sentence)]

这会让我：

['Alice', 'jumped']

现在，如果words列表以另一个序列（words = ['jumped','Alice','played']）给出，我想在sentence中显示匹配结果的顺序，即仍然需要：

['Alice', 'jumped']

而不是

['jumped','Alice']

我应该如何修改代码？

Answer 1

一种方法是将句子作为基础，并过滤其他列表中的单词：

sentence_words = ['Alice','jumped','played']
words = ['jumped', 'Alice']
in_order = filter(set(words).__contains__, sentence_words)
# ['Alice', 'jumped']

或者：

word_set = set(words)
in_order = [word for word in sentence_words if word in word_set]

或者，您可以创建word-＆gt;最后看到的索引的查找，并使用：

lookup = {word: idx for idx, word in enumerate(sentence_words)}
words.sort(key=lookup.__getitem__)
['Alice', 'jumped']

也许将两者结合起来：

new_words = sorted((word for word in words if word in lookup), key=lookup.get)

Answer 2

你可以像这样构建你的模式：

 pattern = r'\b(?:' + '|'.join(words) + r')\b'

并使用findall

 re.findall(pattern, sentence)

删除重复项：

list(set(re.findall(pattern, sentence)))

如何保持正则表达式匹配结果的顺序

2 个答案: