Levenshtein函数查找最接近的名称

时间:2018-11-22 16:41:54

标签: python

我需要以下代码的帮助。在这种情况下,我需要找到与输入单词最接近的单词,以测试我将word_0设置为“ pikaru”,应返回“ pikachu”。 levenshtein函数返回我们输入的两个单词之间的距离。当我运行下面的代码时,我得到的答案是charmander,该方法还差得远,我们将不胜感激。

import backend
name_to_stats, id_to_name, names, 
        pokemon_by_typebackend.get_pokemon_stats()
words = names


word_0 = 'pikaru'
def find_closest_word(word_0, words):
    """Finds the closest word in the list to word_0 as measured by the
    Levenshtein distance

    Args:
        word_0: a str
        words: a list of str

    Returns:
        The closest word in words to word_0 as a str.
    """
    # Hint: use the levenshtein_distance() function to help you out here.
    closest_word = words[0]
    #closest_distance = levenshtein_distance(word_0, words[0])

    for i in words:
        distance = levenshtein_distance(word_0, closest_word)
        new_distance = levenshtein_distance(word_0, i)
        if distance < new_distance:
            return i





def levenshtein_distance(s1, s2):
    """Returns the Levenshtein distance between strs s1 and s2

    Args:
        s1: a str
        s2: a str
    """
    # This function has already been implemented for you.
    # Source of the implementation:
    # https://stackoverflow.com/questions/2460177/edit-distance-in-python
    # If you'd like to know more about this algorithm, you can study it in
    # CSCC73 Algorithms. It applies an advanced technique called dynamic
    # programming.
    # For more information:
    # https://en.wikipedia.org/wiki/Levenshtein_distance
    # https://en.wikipedia.org/wiki/Dynamic_programming
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1],
                                       distances_[-1])))
        distances = distances_
    return distances[-1]

1 个答案:

答案 0 :(得分:1)

错误似乎出在return函数的find_closest_word语句中:

if distance < new_distance:
    return i

该函数不会找到最接近的单词,它实际上会找到列表中比words[0]更远的第一个单词。相反,请尝试遍历words并跟踪哪个单词是到目前为止您看到的最好的单词。像这样:

best_distance = levenshtein_distance(word_0, words[0])
best_word = words[0]
for w in words:
    d = levenshtein_distance(word_0, w)
    if d < best_distance:
        best_distance = d
        best_word = w

return best_word