如何在字符串中查找匹配模式而与顺序无关?

时间:2019-06-11 13:53:17

标签: regex python-3.x string nlp

我正在尝试在两个字符串之间匹配模式。例如,我有

pattern_search = ['education four year'] 
string1 = 'It is mandatory to have at least of four years of professional education'
string2 = 'need to have education four years with professional degree'

当我尝试在pattern_search与string1和string2之间找到匹配项时,我正在尝试一种说法。

当我使用正则表达式库时,match / search / findall对我没有帮助。在字符串中,我具有所有必需的单词,但没有顺序排列,在字符串2中,我有一个额外的单词,并添加了复数形式。

当前,我将预处理后的pattern_search中的每个单词与string1&2中的每个单词拆分为字符串,是否有办法找到句子之间的匹配项?

3 个答案:

答案 0 :(得分:2)

您应该对difflib库有个很好的了解,特别是get_close_matches函数,该函数返回“足够接近”的单词来满足可能不完全匹配的单词的要求。请确保相应地调整阈值(cutoff=

from difflib import get_close_matches
from re import sub

pattern_search = 'education four year'
string1 = 'It is mandatory to have at least of four years of professional education'
string2 = 'need to have education four years with professional degree'
string3 = 'We have four years of military experience'

def match(string, pattern):
  pattern = pattern.lower().split()
  words = set(sub(r"[^a-z0-9 ]", "", string.lower()).split())  # Sanitize input
  return all(get_close_matches(word, words, cutoff=0.8) for word in pattern)

print(match(string1, pattern_search))  # True
print(match(string2, pattern_search))  # True
print(match(string3, pattern_search))  # False

如果要使pattern_search成为模式列表,则可能应该遍历match函数。

答案 1 :(得分:-1)

尝试一下:

def have_same_words(string1, string2):
    return sorted(string1.split()) == sorted(string2.split())

print(have_same_words("It is mandatory to have at least of four years of professional education", "education four year"))

如果有帮助,请接受答案。

答案 2 :(得分:-2)

在Python中检查一个字符串是否包含另一个字符串,您可以尝试以下几种操作:

使用于

>>> pattern_search in string
True

或者找到

>>> string1.find(pattern_search)
[returns value greater than 1 if True or -1 if False]