Question

我正在尝试在一个句子中找到关键字，这些关键字通常是单个单词，但可以是多单词组合（例如“欧元成本”）。因此，如果我有一个像cost in euros of bacon这样的句子，它将在该句子中找到cost in euros并返回true。

为此，我正在使用以下代码：

if any(phrase in line for phrase in keyword['aliases']:

其中line是输入，aliases是与关键字匹配的短语数组（例如，以欧元为单位的费用，其为['cost in euros', 'euros', 'euro cost']）。

但是，我注意到它也在单词部分上触发。例如，我有一个匹配词组y和一个句子trippy cake。我不希望它返回true，但确实如此，因为它显然在y中找到了trippy。我怎样才能只检查整个单词呢？最初，我是使用单词列表来进行此关键字搜索的（基本上是line.split()并进行检查），但是不适用于多单词关键字别名。

Answer 1

这应该可以满足您的需求：

import re

aliases = [
    'cost.',
    '.cost',
    '.cost.',
    'cost in euros of bacon',
    'rocking euros today',
    'there is a cost inherent to bacon',
    'europe has cost in place',
    'there is a cost.',
    'I was accosted.',
    'dealing with euro costing is painful']
phrases = ['cost in euros', 'euros', 'euro cost', 'cost']

matched = list(set([
    alias
    for alias in aliases
    for phrase in phrases
    if re.search(r'\b{}\b'.format(phrase), alias)
    ]))

print(matched)

输出：

['there is a cost inherent to bacon', '.cost.', 'rocking euros today', 'there is a cost.', 'cost in euros of bacon', 'europe has cost in place', 'cost.', '.cost']

基本上，我们使用pythons re模块作为测试来获取所有匹配项，包括在给定phrase中使用复合{{1 }}，然后使用alias修剪来自list comprehension的重复项，然后使用set()将list强制变回list()。

参考：

列表： https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

列表理解： https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

设置： https://docs.python.org/3/tutorial/datastructures.html#sets

re（或正则表达式）： https://docs.python.org/3/library/re.html#module-re

在Python的句子中查找（可能是多词）短语

1 个答案: