我无法修改汉明距离算法以便以两种方式影响我的数据
如果为小写字母切换大写字母,则将.5添加到汉明距离,除非它位于第一个位置。
例子包括:"杀手"和"杀手"距离为0"杀手"和" KiLler"汉明距离为.5。 "滑稽"和FAnny"距离为1.5(不同字母为1,不同大写为0.5)。
使b和d(以及他们的大写同行)被视为同一件事
这是我发现的代码,它构成了基本的Hamming程序
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'mark'
b = 'Make'
print hamming_distance(a, b)
欢迎任何建议!
答案 0 :(得分:0)
这是一个简单的解决方案。当然,它可以针对更好的性能进行优化。
注意:我使用Python 3,因为Python 2 will retire soon。
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
# b and d are interchangeable
s1 = s1.replace('b', 'd').replace('B', 'D')
s2 = s2.replace('b', 'd').replace('B', 'D')
# add 1 for each different character
hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
# add .5 for each lower/upper case difference (without first letter)
for i in range(1, len(s1)):
hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
s2[i] >= 'A' and s2[i] <= 'Z' or\
s1[i] >= 'A' and s1[i] <= 'Z' and\
s2[i] >= 'a' and s2[i] <= 'z')
return hammingdist
def print_hamming_distance(s1, s2):
print("hamming distance between", s1, "and", s2, "is",
hamming_distance(s1, s2))
if __name__ == "__main__":
assert hamming_distance('mark', 'Make') == 2
assert hamming_distance('Killer', 'killer') == 0
assert hamming_distance('killer', 'KiLler') == 0.5
assert hamming_distance('bole', 'dole') == 0
print("all fine")
print_hamming_distance("organized", "orGanised")
# prints: hamming distance between organized and orGanised is 1.5