Question

我有一套单词

words = {＆＃39;感谢给予＆＃39;，＆＃39; cat＆＃39;，＆＃39;而不是＆＃39;等等...}

我需要在表格列中详细搜索这些字词＆＃39; description＆＃39;

--------------------------------|
ID  | Description               |
--- |---------------------------|
1   | having fun   thanks giving| 
----|---------------------------|
2   |  cat eats all the food    |
----|---------------------------|
3   |  instead you can come     | 
--------------------------------

def matched_words(x,words):
   match_words =[]
  for word in words:
     if word in x:
       match_words.append(word)
  return match_words

df['new_col'] = df['description'].apply(lambda x:matched_words(x,words))

期望的输出：

----|---------------------------|-------------------|
ID  | Description               |matched words      |
--- |---------------------------|-------------------|
1   | having fun   thanks giving|['thanks giving']  |
----|---------------------------|------------------ |
2   |  cat eats all the food    |['cat']            |
----|---------------------------|-------------------|
3   |  instead you can come     | []                |
----------------------------------------------------

我只获得匹配的单个令牌，例如[＆＃39; cat＆＃39;]

Answer 1

以下代码应该为您提供所需的结果：

import re

words = {'thanks', 'cat', 'instead of'}
phrases = [
    [1,"having fun at thanksgiving"],
    [2,"cater the food"],
    [3, "instead you can come"],
    [4, "instead of pizza"],
    [5, "thanks for all the fish"]
]

matched_words = []
matched_pairs = []
for word in words:
    for phrase in phrases:
        result = re.search(r'\b'+word+'\W', phrase[1])
        if result:
            matched_words.append(result.group(0))
            matched_pairs.append([result.group(0), phrase])
            print()

print(matched_words)
print(matched_pairs)

相关部分，即regex位re.search(r'\b'+word+'\W', phrase[1])，正在搜索从字边界\b开始搜索字符串的情况，或{{1} }，并以非单词字符empty string结尾。这应该确保我们只找到整个字符串匹配。无需对要搜索的文本执行任何其他操作。

当然，您可以使用您想要的任何内容，而不是\W，words，phrases和matched_words。

希望这有帮助！

Answer 2

import re
words = {'thanks', 'cat', 'instead of'}

samples = [
    (1, 'having fun at thanksgiving'),
    (2, 'cater the food'),
    (3, 'instead you can come'),
    (4, 'instead of you can come'),
]

for id, description in samples:
    for word in words:
        if re.search(r'\b' + word + r'\b', description):
            print("'%s' in '%s" % (word, description))

如何在python上执行精确的字符串匹配

2 个答案: