我试图找到需要删除多少个字符才能使两个单词相同。例如" at"," cat"将是1,因为我可以删除c,"船"并且"得到了#34;将是3因为我可以删除b,a和g以使其成为ot。我将这些单词放入字典中,将其计数作为值。然后我迭代字典并查看该键是否存在于另一个字典中,否则我将差值加1。这是一个非常低效的算法吗?
但它高估了我需要的删除次数。
def deletiondistance(firstword, secondword):
dfw = {}
dsw = {}
diff = 0
for i in range(len(firstword)):
print firstword[i]
if firstword[i] in dfw:
dfw[firstword[i]]+=1
else:
dfw[firstword[i]]=1
for j in range(len(secondword)):
if secondword[j] in dsw:
dsw[secondword[j]] +=1
else:
dsw[secondword[j]]=1
for key, value in dfw.iteritems():
if key in dsw:
#print "key exists"
pass
else:
diff +=1
print "diff",diff
答案 0 :(得分:4)
我认为你的目标与levenshtein距离类似。
Levenshtein距离是测量2个弦之间距离的度量标准。
这是一个维基链接。 https://en.wikipedia.org/wiki/Levenshtein_distance
这是levenshtein距离的pypi包。 https://pypi.python.org/pypi/python-Levenshtein
答案 1 :(得分:3)
正如@Hulk所说,这与levenshtein距离类似。唯一的区别是不允许替换,但可以通过使用2的替换成本来纠正,这与从两个字符串中删除字符相同。例如:
userhashVal
输出:
def dist(s1, s2):
cur = list(range(len(s2) + 1))
prev = [0] * (len(s2) + 1)
for i in range(len(s1)):
cur, prev = prev, cur
cur[0] = i + 1
for j in range(len(s2)):
# Substitution is same as two deletions
sub = 0 if s1[i] == s2[j] else 2
cur[j+1] = min(prev[j] + sub, cur[j] + 1, prev[j+1] + 1)
return cur[-1]
cases=[('cat','bat'),
('bat','cat'),
('broom', 'ballroom'),
('boat','got'),
('foo', 'bar'),
('foobar', '')]
for s1, s2 in cases:
print('{} & {} = {}'.format(s1, s2, dist(s1, s2)))
答案 2 :(得分:2)
您可以使用difflib。
示例:
import difflib
cases=[('cat','bat'),
('bat','cat'),
('broom', 'ballroom'),
('boat','got')]
for a,b in cases:
print('{} => {}'.format(a,b))
cnt=0
for i,s in enumerate(difflib.ndiff(a, b)):
if s[0]==' ': continue
elif s[0]=='-':
print(u'Delete "{}" from position {}'.format(s[-1],i))
elif s[0]=='+':
print(u'Add "{}" to position {}'.format(s[-1],i))
cnt+=1
print("total=",cnt,"\n")
打印:
cat => bat
Delete "c" from position 0
Add "b" to position 1
total= 2
bat => cat
Delete "b" from position 0
Add "c" to position 1
total= 2
broom => ballroom
Add "a" to position 1
Add "l" to position 2
Add "l" to position 3
total= 3
boat => got
Delete "b" from position 0
Add "g" to position 1
Delete "a" from position 3
total= 3