Question

我有一个关键字列表。样本是：

 ['IO', 'IO Combination','CPI Combos']

现在我要做的是查看字符串中是否存在这些关键字。例如，如果我的字符串是：there is a IO competition coming in Summer 2018。因此，对于此示例，因为它包含IO，它应该标识出来，但如果字符串是there is a competition coming in Summer 2018，那么它不应该标识任何关键字。

我编写了这个Python代码，但它也标识了IO中的competition：

if any(word.lower() in string_1.lower() for word in keyword_list):
                    print('FOUND A KEYWORD IN STRING')

我还想确定字符串中标识的关键字（如果存在）。我的代码中有什么问题，如何确保它只匹配完整的单词？

Answer 1

正则表达式解决方案

您需要在此处实现字边界：

import re

keywords = ['IO', 'IO Combination','CPI Combos']

words_flat = "|".join(r'\b{}\b'.format(word) for word in keywords)
rx = re.compile(words_flat)

string = "there is a IO competition coming in Summer 2018"
match = rx.search(string)

if match:
    print("Found: {}".format(match.group(0)))
else:
    print("Not found")

此处，您的列表与|和\b双方加在一起之后，您可以使用re.search()进行搜索，在此示例中打印"Found: IO"。

直接理解时更短：

rx = re.compile("|".join(r'\b{}\b'.format(word) for word in keywords))

非正则表达式解决方案

请注意，您甚至可以使用非正则性解决方案用于单个单词，您只需重新排序您的理解并使用split()之类的

found = any(word in keywords for word in string.split())

if found:
    # do sth. here

注释

后者的缺点是像

这样的字符串

there is a IO. competition coming in Summer 2018
#         ---^---

在正则表达式解决方案中被视为“单词”时将无效（因此这些方法会产生不同的结果）。此外，由于split()功能，无法找到CPI Combos之类的组合短语。正则表达式解决方案的优势在于甚至支持小写和大写场景（只需应用flag = re.IGNORECASE）。

这实际上取决于您的实际要求。

Answer 2

for index,key in enumerate(mylist):
      if key.find(mystring) != -1:
         return index

它遍历你的列表，在列表中的每个项目上，它检查你的字符串是否包含在项目中，如果是，find()返回-1表示它被包含，如果发生这种情况，在enumerate()的帮助下，您可以获得找到它的项目的索引。

如果字符串中存在列表中的任何关键字，则匹配

2 个答案:

正则表达式解决方案

非正则表达式解决方案

注释