我想检查两个字符串是否彼此相似.... 例如:
string1 = "Select a valid choice. **aaaa** is not one of the available choices."
string2 = "Select a valid choice. **bbbb** is not one of the available choices."
或
string3 = "Ensure this value has at most 30 characters (it has 40 chars)."
string4 = "Ensure this value has at most 60 characters (it has 110 chars)."
如果我将string1与string2进行比较,它应该返回True
,如果我将string1与string3进行比较,它应该返回False
。
答案 0 :(得分:3)
您可以使用Levenshtein distance
def lev(s1, s2):
if len(s1) < len(s2):
return lev(s2, s1)
# len(s1) >= len(s2)
if len(s2) == 0:
return len(s1)
previous_row = xrange(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1 # j+1 instead of j since previous_row and current_row are one character longer
deletions = current_row[j] + 1 # than s2
substitutions = previous_row[j] + (c1 != c2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
string1 = "Select a valid choice. aaaa is not one of the available choices."
string2 = "Select a valid choice. bbbb is not one of the available choices."
string3 = "Ensure this value has at most 30 characters (it has 40 chars)."
string4 = "Ensure this value has at most 60 characters (it has 110 chars)."
print lev(string1, string2) # => 4
print lev(string3, string4) # => 3
print lev(string1, string3) # => 49
从here
复制的代码