我正在努力编写一种正则表达式,以在情况1匹配A到B的情况下至少匹配两个单词。我刚刚找到了在输入A中排除does
或任何字典单词的方法,所以没有问题在情况2中。假设情况1中的Wakanda
和exist
-A应该与B匹配,并假设do
,in
和the
之类的词已被删除
CASE 1
A -> Do Wakanda exist in the world?
B -> Does Wakanda exist?
>> A should match B
exclude = ['do', 'in', 'the']
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
split_A = A.lower().split()
final_A = [i if i not in exclude else '' for i in split_A]
A = " ".join(' '.join(final_A).strip().split())
CASE 1
A -> wakanda exist world?
B -> Does Wakanda exist?
>> A should match B
CASE 2
A -> Does Atlantis exist in our world?
B -> Does Wakanda exist?
>> A should not match B
答案 0 :(得分:2)
您可以使用set
操作来查看两个句子是否匹配(无需使用正则表达式,但是您需要进行一些预处理-删除?
,将句子小写,等等):>
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"
exclude = ['do', 'in', 'the', 'does']
def a_match_b(a, b):
a = set(a.replace('?', '').lower().split()) - set(exclude)
b = set(b.replace('?', '').lower().split()) - set(exclude)
return len(a.intersection(b)) > 1
print(a_match_b(A, B))
print(a_match_b(A2, B2))
输出为:
True
False
编辑:
正如@tobias_k所说,您可以使用regexp查找单词,因此也可以使用:
import re
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"
exclude = ['do', 'in', 'the', 'does']
def a_match_b(a, b):
words_a = re.findall(r'[\w]+', a.lower())
words_b = re.findall(r'[\w]+', b.lower())
a = set(words_a) - set(exclude)
b = set(words_b) - set(exclude)
return len(a.intersection(b)) > 1
print(a_match_b(A, B))
print(a_match_b(A2, B2))
答案 1 :(得分:0)
如果它在您使用的任何正则表达式解析器中运行,这是一个更“纯”的正则表达式解决方案:
使用“ ||”连接字符串并尝试与此正则表达式匹配:
(?i).*?(\b\w+\b).*?(\b\w+\b).*?\|\|(?:.*\b\1\b.*\b\2\b.*|.*\b\2\b.*\b\1\b.*)
因此,在字符串wakanda exist world||Does Wakanda exist?
上运行它将与两个组匹配:wakanda
和exist
如果您在wakanda xist ello world||does exist wakanda hello
上运行它,则它不会匹配两个,因为只有wakanda
匹配...
根据需要将"wakanda exist world?"
转换为“ \ bwakanda \ b | \ bexist \ b | \ bworld \ b”,然后在第二个字符串上运行,得到一个匹配项,例如wakanda
,然后从列表中删除wakanda
,然后再次运行。如果您获得第二场比赛,那就很好。
由于您尚未将Python指定为语言标签,而且我也不知道python,因此我将提供JavaScript来实现此目的,并且您可以根据需要进行调整
var simplifiedSentence1 = "wakanda exist world?";
var simplifiedSentence2 = "Does Wakanda exist?"
matchExp = new RegExp(".*?("
+ simplifiedSentence1
.replace(/\W+/g,"|")
.replace(/^\||\|$/,"")
.replace(/(\w+)/g,"\\b$1\\b")
+ ")","i");
match = matchExp.exec(simplifiedSentence2)[1];
matchExp2 = new RegExp("\\b" + match + "\\b\\W*", "i");
TwoWordsMatched = matchExp.test(simplifiedSentence2.replace(matchExp2, ""));
TwoWordsMatched
如果两个语句之间的两个单词匹配,则为true;如果一个或更少的单词匹配,则为false