Question

给定一个有限的单词词典和一个起始端对（例如下面例子中的“指针”和“脚”），找到最短的单词序列，这样序列中的任何单词都可以从其任何一个单词形成邻居1）插入一个字符，2）删除一个字符，或3）更改一个字符。

指针 - ＆gt; 手 - ＆gt; 和 - ＆gt; 结束 - ＆gt; fend - ＆gt; 饲料 - ＆gt; 脚

对于那些可能想知道的人 - 这不是分配给我的作业问题，也不是我在面试中被问到的问题;这只是一个让我感兴趣的问题。

我正在寻找一个或两个句子“自上而下的观点”，了解你将采取什么方法 - 以及大胆的，任何语言的工作实施。

Answer 1

不要将字典变成完整的图形，而是使用结构稍微不足的东西：

对于字典中的每个word，您可以通过删除shortened_word中每个i的字符编号i来获得len(word)。将货币对(shortened_word, i)映射到所有word的列表。

这有助于查找包含一个替换字母的所有单词（因为它们必须位于某个(shortened_word, i)的同一个i bin中，以及另外一个字母的单词（因为它们必须位于某些(word, i) 1}} bin为某些i。

Python代码：

from collections import defaultdict, deque
from itertools import chain

def shortened_words(word):
    for i in range(len(word)):
        yield word[:i] + word[i + 1:], i


def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)
    return g


def walk_graph(g, d, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        for next_word in chain(same_length, one_longer, one_shorter):
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

用法：

dictionary = ispell_dict # list of 47158 words

graph = prepare_graph(dictionary)
print(" -> ".join(walk_graph(graph, dictionary, "hands", "feet")))
print(" -> ".join(walk_graph(graph, dictionary, "brain", "game")))

输出：

hands -> bands -> bends -> bents -> beets -> beet -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game

关于速度的一句话：建立'图形助手'很快（1秒），但是手 - ＆gt;脚需要14秒，而大脑 - ＆gt;游戏需要7秒钟。

修改：如果您需要更快的速度，可以尝试使用图表或网络库。或者你实际构建完整的图形（慢）然后更快地找到路径。这主要包括将边缘的查找从步行函数移动到图形构建函数：

def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)

    next_words = {}
    for word in d:
        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        next_words[word] = set(chain(same_length, one_longer, one_shorter))
        next_words[word].remove(word)

    return next_words


def walk_graph(g, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        for next_word in g[word]:
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

用法：首先构建图表（慢速，某些i5笔记本电脑的所有时间，YMMV）。

dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary)  # more than 6 minutes!

现在找到路径（比以前快很多，没有打印的时间）：

print(" -> ".join(walk_graph(graph, "hands", "feet")))          # 10 ms
print(" -> ".join(walk_graph(graph, "brain", "game")))          #  6 ms
print(" -> ".join(walk_graph(graph, "tampering", "crunchier"))) # 25 ms

输出：

hands -> lands -> lends -> lens -> lees -> fees -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
tampering -> tapering -> capering -> catering -> watering -> wavering -> havering -> hovering -> lovering -> levering -> leering -> peering -> peeping -> seeping -> seeing -> sewing -> swing -> swings -> sings -> sines -> pines -> panes -> paces -> peaces -> peaches -> beaches -> benches -> bunches -> brunches -> crunches -> cruncher -> crunchier

Answer 2

一种天真的方法可以是将字典变成图形，其中单词作为节点，边缘连接“邻居”（即可以通过一次操作相互转换的单词）。然后，您可以使用最短路径算法来查找单词A和单词B之间的距离。

关于这种方法的难点在于找到一种方法来有效地将字典转换为图形。

Answer 3

快速回答。您可以在大多数动态编程文本中计算Levenshtein distance，“常用”编辑距离，并从生成的计算表中尝试构建该路径。

来自维基百科链接：

d[i, j] := minimum
               (
                 d[i-1, j] + 1,  // a deletion
                 d[i, j-1] + 1,  // an insertion
                 d[i-1, j-1] + 1 // a substitution
               )

您可以注意到代码中的这些情况何时发生（可能在某些辅助表中），当然，从那里重建解决方案路径也很容易。

通过一次更改，插入或删除一个字符，将一个单词转换为另一个单词

3 个答案: