Question

我想使用正则表达式获得单词列表的交集。它的C实现让它运行得更快在这个特殊情况下非常重要......即使我的代码几乎可以工作，它也会匹配'embeded-words'，比如“买家”和“买”这样的例子。

有些代码可能会更好地解释它。这就是我到目前为止所做的：

re.findall(r"(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r"))", ' '.join(['aabuya', 'gilt', 'buyer']))
>> ['buy', 'gilt', 'buy']

虽然这是我想要的：

re.exactfindall(['buy', 'sell', 'gilt'], ['aabuya', 'gilt', 'buyer'])
>>['gilt']

感谢。

Answer 1

要使用regexp执行此操作，最简单的方法可能是在匹配表达式中包含单词分隔符（\b），（在catch之外）为您提供：

re.findall(r"\b(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r")\b)",
    ' '.join(['aabuya', 'gilt', 'buyer']))

按要求输出['gilt']。

Answer 2

listgiven=['aabuya', 'gilt', 'buyer']
listtomatch=['buy', 'sell', 'gilt']
exactmatch = [x for x in listgiven if x in listtomatch]
print(exactmatch)