Question

如何验证字符串中出现的确切单词？

我需要考虑一个词，例如＆＃34; king＆＃34;如下例所示，紧接着有一个问号。

unigrams 这应该是 False

In [1]: answer = "king"
In [2]: context = "we run with the king? on sunday"

n_grams 这应该是错误

In [1]: answer = "king tut"
In [2]: context = "we run with the king tut? on sunday"

unigrams 这应该是 True

In [1]: answer = "king"
In [2]: context = "we run with the king on sunday"

n_grams 这应该是 True

In [1]: answer = "king tut"
In [2]: context = "we run with the king tut on sunday"

正如人们所提到的，对于unigram案例，我们可以通过将字符串拆分为列表来处理它，但这对n_grams不起作用。

在阅读了一些帖子之后，我想我应该尝试使用背后的外观，但我不确定。

Answer 1

CREATE TABLE[dbo].[autos]( 
id_auto int IDENTITY(1, 1) not null PRIMARY KEY, 
patente varchar(7) not null, 
marca varchar(12) not null, 
modelo varchar(12) not null, 
[año] int not null, -- <<== Here
comentarios_auto varchar(200),
fecha_registro date DEFAULT GetDate() not null)

你不需要正则表达式。

如果您正在寻找关键字：

return answer in context.split():

>>> answer in context.split()
False

将与all([ans in context.split() for ans in answer.split()])一起使用，但这取决于您是否要匹配以下字符串：

"king tut"

如果你不这样做，你仍然不需要一个正则表达式(although you should probably use one)，因为你只想考虑整个术语（默认情况下通过{正确分割） {1}}）：

"we tut with the king"

在最坏情况字符串上为.split()，大约是普通字符串上正则表达式的一半。

def ngram_in(match, string):
    matches = match.split()
    if len(matches) == 1:
        return matches[0] in string.split()
    words = string.split()
    words_len = len(words)
    matches_len = len(matches)
    for index, word in enumerate(words):
        if index + matches_len > words_len:
            return False
        if word == matches[0]:
            for match_index, match in enumerate(matches):
                potential_match = True
                if words[index + match_index] != match:
                    potential_match = False
                    break
            if potential_match == True:
                return True
    return False

Answer 2

使用这样的正则表达式：

reg_answer = re.compile(r"(?<!\S)" + re.escape(answer) + r"(?!\S)")

请参阅Python demo

<强>详情：

(?<!\S) - 确保匹配前面带有空格或字符串开头的负面后瞻
re.escape(answer) - 一个预处理步骤，使搜索词中的所有特殊字符都被视为文字字符
(?!\S) - 一个负面的预测，以确保匹配后跟空格或字符串结尾。

Answer 3

为什么不检查：

if answer in context: do stuff

查看this post了解详情

如何不匹配整个单词＆＃34; king＆＃34;去国王？＆＃34;？

3 个答案: