如何保持正则表达式匹配结果的顺序

时间:2014-04-04 17:14:20

标签: python regex python-2.7

sentence = 'Alice was not a bit hurt, and she jumped up on to her feet in a moment.'
words = ['Alice','jumped','played']

为了匹配sentence中的words,我使用了last post

答案中的代码
[w for w in words if re.search(r'\b{}\b'.format(re.escape(w)), sentence)]

这会让我:

['Alice', 'jumped']

现在,如果words列表以另一个序列(words = ['jumped','Alice','played'])给出,我想在sentence中显示匹配结果的顺序,即仍然需要:

['Alice', 'jumped']

而不是

['jumped','Alice']

我应该如何修改代码?

2 个答案:

答案 0 :(得分:3)

一种方法是将句子作为基础,并过滤其他列表中的单词:

sentence_words = ['Alice','jumped','played']
words = ['jumped', 'Alice']
in_order = filter(set(words).__contains__, sentence_words)
# ['Alice', 'jumped']

或者:

word_set = set(words)
in_order = [word for word in sentence_words if word in word_set]

或者,您可以创建word->最后看到的索引的查找,并使用:

lookup = {word: idx for idx, word in enumerate(sentence_words)}
words.sort(key=lookup.__getitem__)
['Alice', 'jumped']

也许将两者结合起来:

new_words = sorted((word for word in words if word in lookup), key=lookup.get)

答案 1 :(得分:1)

你可以像这样构建你的模式:

 pattern = r'\b(?:' + '|'.join(words) + r')\b'

并使用findall

 re.findall(pattern, sentence)

删除重复项:

list(set(re.findall(pattern, sentence)))