将输入中的至少两个单词与一个语句匹配

时间:2018-07-24 10:37:25

标签: regex

我正在努力编写一种正则表达式,以在情况1匹配A到B的情况下至少匹配两个单词。我刚刚找到了在输入A中排除does或任何字典单词的方法,所以没有问题在情况2中。假设情况1中的Wakandaexist-A应该与B匹配,并假设dointhe之类的词已被删除

CASE 1
A -> Do Wakanda exist in the world?
B -> Does Wakanda exist?
>> A should match B

exclude = ['do', 'in', 'the']
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
split_A = A.lower().split()
final_A = [i if i not in exclude else '' for i in split_A]
A = " ".join(' '.join(final_A).strip().split())

CASE 1
A -> wakanda exist world?
B -> Does Wakanda exist?
>> A should match B

CASE 2
A -> Does Atlantis exist in our world?
B -> Does Wakanda exist?
>> A should not match B

2 个答案:

答案 0 :(得分:2)

您可以使用set操作来查看两个句子是否匹配(无需使用正则表达式,但是您需要进行一些预处理-删除?,将句子小写,等等):

A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"

A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"

exclude = ['do', 'in', 'the', 'does']

def a_match_b(a, b):
    a = set(a.replace('?', '').lower().split()) - set(exclude)
    b = set(b.replace('?', '').lower().split()) - set(exclude)
    return len(a.intersection(b)) > 1

print(a_match_b(A, B))
print(a_match_b(A2, B2))

输出为:

True
False

编辑:

正如@tobias_k所说,您可以使用regexp查找单词,因此也可以使用:

import re

A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"

A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"

exclude = ['do', 'in', 'the', 'does']

def a_match_b(a, b):
    words_a = re.findall(r'[\w]+', a.lower())
    words_b = re.findall(r'[\w]+', b.lower())
    a = set(words_a) - set(exclude)
    b = set(words_b) - set(exclude)
    return len(a.intersection(b)) > 1

print(a_match_b(A, B))
print(a_match_b(A2, B2))

答案 1 :(得分:0)

编辑:

如果它在您使用的任何正则表达式解析器中运行,这是一个更“纯”的正则表达式解决方案:

使用“ ||”连接字符串并尝试与此正则表达式匹配:

(?i).*?(\b\w+\b).*?(\b\w+\b).*?\|\|(?:.*\b\1\b.*\b\2\b.*|.*\b\2\b.*\b\1\b.*)

因此,在字符串wakanda exist world||Does Wakanda exist?上运行它将与两个组匹配:wakandaexist

如果您在wakanda xist ello world||does exist wakanda hello上运行它,则它不会匹配两个,因为只有wakanda匹配...

其他更详细和可扩展的解决方案:

根据需要将"wakanda exist world?"转换为“ \ bwakanda \ b | \ bexist \ b | \ bworld \ b”,然后在第二个字符串上运行,得到一个匹配项,例如wakanda,然后从列表中删除wakanda,然后再次运行。如果您获得第二场比赛,那就很好。

由于您尚未将Python指定为语言标签,而且我也不知道python,因此我将提供JavaScript来实现此目的,并且您可以根据需要进行调整

var simplifiedSentence1 = "wakanda exist world?";
var simplifiedSentence2 = "Does Wakanda exist?"

matchExp = new RegExp(".*?("
    + simplifiedSentence1
        .replace(/\W+/g,"|")
        .replace(/^\||\|$/,"")
        .replace(/(\w+)/g,"\\b$1\\b")
    + ")","i");
match = matchExp.exec(simplifiedSentence2)[1];
matchExp2 = new RegExp("\\b" + match + "\\b\\W*", "i");
TwoWordsMatched = matchExp.test(simplifiedSentence2.replace(matchExp2, ""));

TwoWordsMatched如果两个语句之间的两个单词匹配,则为true;如果一个或更少的单词匹配,则为false