对汉明距离/编辑距离的修改

时间:2018-06-13 20:04:13

标签: hamming-distance edit-distance

我无法修改汉明距离算法以便以两种方式影响我的数据

  1. 如果为小写字母切换大写字母,则将.5添加到汉明距离,除非它位于第一个位置。
    例子包括:"杀手"和"杀手"距离为0"杀手"和" KiLler"汉明距离为.5。 "滑稽"和FAnny"距离为1.5(不同字母为1,不同大写为0.5)。

  2. 使b和d(以及他们的大写同行)被视为同一件事

  3. 这是我发现的代码,它构成了基本的Hamming程序

    def hamming_distance(s1, s2):
        assert len(s1) == len(s2)
        return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
    
    if __name__=="__main__":
        a = 'mark'
        b = 'Make'
        print hamming_distance(a, b) 
    

    欢迎任何建议!

1 个答案:

答案 0 :(得分:0)

这是一个简单的解决方案。当然,它可以针对更好的性能进行优化。

注意:我使用Python 3,因为Python 2 will retire soon

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    # b and d are interchangeable
    s1 = s1.replace('b', 'd').replace('B', 'D')
    s2 = s2.replace('b', 'd').replace('B', 'D')
    # add 1 for each different character
    hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
    # add .5 for each lower/upper case difference (without first letter)
    for i in range(1, len(s1)):
        hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
                              s2[i] >= 'A' and s2[i] <= 'Z' or\
                              s1[i] >= 'A' and s1[i] <= 'Z' and\
                              s2[i] >= 'a' and s2[i] <= 'z')
    return hammingdist

def print_hamming_distance(s1, s2):
    print("hamming distance between", s1, "and", s2, "is",
          hamming_distance(s1, s2))

if __name__ == "__main__":
    assert hamming_distance('mark', 'Make') == 2
    assert hamming_distance('Killer', 'killer') == 0
    assert hamming_distance('killer', 'KiLler') == 0.5
    assert hamming_distance('bole', 'dole') == 0
    print("all fine")
    print_hamming_distance("organized", "orGanised")
    # prints: hamming distance between organized and orGanised is 1.5