单词列表中最长的单词链

时间:2018-11-26 16:11:32

标签: python recursion graph path-finding

所以,这是我要制作的函数的一部分。

我不希望代码太复杂。

我有一个单词列表,例如

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

单词链顺序的概念是让下一个单词以最后一个单词结尾的字母开头。

(编辑:每个单词不能多次使用。除此之外,没有其他限制。)

我希望输出给出最长的单词链序列,在这种情况下为:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

我不太确定该怎么做,我尝试了不同的尝试。其中之一...

如果我们从列表中的特定单词开始,则此代码可以正确找到单词链,例如单词[0](所以是“长颈鹿”):

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []

word_chain.append(words[0])

for word in words:
    for char in word[0]:

       if char == word_chain[-1][-1]:
            word_chain.append(word)

print(word_chain)

输出:

['giraffe', 'elephant', 'tiger', 'racoon']

但是,我想找到最长的单词链(如上所述)。

我的方法:因此,我尝试使用上面编写并遍历的工作代码,以列表中的每个单词为起点,并找到每个单词的单词链[0 ],word [1],word [2]等。然后,我尝试使用if语句查找最长的单词链,并将其长度与先前的最长​​链进行比较,但是我做得不好,我没有真的不知道这是怎么回事。

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []
max_length = 0
for starting_word_index in range(len(words) - 1):

    word_chain.append(words[starting_word_index])

    for word in words:
        for char in word[0]:

            if char == word_chain[-1][-1]:
                word_chain.append(word)

    # Not sure

    if len(word_chain) > max_length:
        final_word_chain = word_chain
        longest = len(word_chain)
        word_chain.clear()

print(final_word_chain)

这是我的第n次尝试,我认为这是打印一个空列表,在此之前,我进行了不同的尝试,但未能正确清除word_chain列表,并最终再次重复了单词。

任何帮助,不胜感激。希望我不会太烦人或令人困惑...谢谢!

9 个答案:

答案 0 :(得分:26)

您可以使用递归来探索将包含正确的初始字符的每个可能字母添加到运行列表时出现的每个“分支”:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
def get_results(_start, _current, _seen):
  if all(c in _seen for c in words if c[0] == _start[-1]):
    yield _current
  else:
      for i in words:
        if i[0] == _start[-1]:
          yield from get_results(i, _current+[i], _seen+[i])


new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

此解决方案的工作原理类似于广度优先搜索,因为只要之前未调用当前值,函数get_resuls就会继续遍历整个列表。该函数已看到的值将添加到_seen列表中,最终停止递归调用的流。

此解决方案还将忽略具有重复项的结果:

words = ['giraffe', 'elephant', 'ant', 'ning', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse',]
new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出:

['ant', 'tiger', 'racoon', 'ning', 'giraffe', 'elephant']

答案 1 :(得分:16)

我有一个新主意,如图所示:

enter image description here

我们可以用word [0] == word [-1]构造有向图,然后将问题转换为找到最大长度的路径。

答案 2 :(得分:11)

正如其他人所提到的,问题在于找到longest path in a directed acyclic graph

对于任何与Python相关的图形,networkx是您的朋友。

您只需要初始化图,添加节点,添加边并启动dag_longest_path

import networkx as nx
import matplotlib.pyplot as plt

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat',
         'hedgehog', 'mouse']

G = nx.DiGraph()
G.add_nodes_from(words)

for word1 in words:
    for word2 in words:
        if word1 != word2 and word1[-1] == word2[0]:
            G.add_edge(word1, word2)
nx.draw_networkx(G)
plt.show()
print(nx.algorithms.dag.dag_longest_path(G))

enter image description here

它输出:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

注意:此算法仅在图形中没有循环(循环)时才起作用。这意味着它将以['ab', 'ba']失败,因为存在无限长的路径:['ab', 'ba', 'ab', 'ba', 'ab', 'ba', ...]

答案 3 :(得分:4)

本着蛮力解决方案的精神,您可以检查words列表的所有排列并选择最佳的连续开始顺序:

from itertools import permutations

def continuous_starting_sequence(words):
    chain = [words[0]]
    for i in range(1, len(words)):
        if not words[i].startswith(words[i - 1][-1]):
            break
        chain.append(words[i])
    return chain

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
best = max((continuous_starting_sequence(seq) for seq in permutations(words)), key=len)

print(best)
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

由于我们正在考虑所有排列,因此我们知道必须存在以最大字链开头的排列。

这当然具有 O(n n!)时间复杂度:D

答案 4 :(得分:3)

此函数创建一种称为生成器的迭代器(请参见:What does the "yield" keyword do?)。它递归地创建同一生成器的其他实例,以探索所有可能的尾部序列:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chains(words, previous_word=None):
    # Consider an empty sequence to be valid (as a "tail" or on its own):
    yield []
    # Remove the previous word, if any, from consideration, both here and in any subcalls:
    words = [word for word in words if word != previous_word]
    # Take each remaining word...
    for each_word in words:
        # ...provided it obeys the chaining rule
        if not previous_word or each_word.startswith(previous_word[-1]):
            # and recurse to consider all possible tail sequences that can follow this particular word:
            for tail in chains(words, previous_word=each_word):
                # Concatenate the word we're considering with each possible tail:
                yield [each_word] + tail  

all_legal_sequences = list(chains(words))  # convert the output (an iterator) to a list
all_legal_sequences.sort(key=len) # sort the list of chains in increasing order of chain length
for seq in all_legal_sequences: print(seq)
# The last line (and hence longest chain) prints as follows:
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

或者,为了更有效地直达最长的链条:

print(max(chains(words), key=len)

最后,这是一个替代版本,允许在输入中重复单词(即,如果您包含一个单词N次,则最多可以在该链中使用N次):

def chains(words, previous_word_index=None):
    yield []
    if previous_word_index is not None:
        previous_letter = words[previous_word_index][-1]
        words = words[:previous_word_index] + words[previous_word_index + 1:]
    for i, each_word in enumerate( words ):
        if previous_word_index is None or each_word.startswith(previous_letter):
            for tail in chains(words, previous_word_index=i):
                yield [each_word] + tail  

答案 5 :(得分:2)

这是一种可行的递归蛮力方法:

def brute_force(pool, last=None, so_far=None):
    so_far = so_far or []
    if not pool:
        return so_far
    candidates = []
    for w in pool:
        if not last or w.startswith(last):
            c_so_far, c_pool = list(so_far) + [w], set(pool) - set([w])
            candidates.append(brute_force(c_pool, w[-1], c_so_far))
    return max(candidates, key=len, default=so_far)

>>> brute_force(words)
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

在每个递归调用中,它将尝试使用剩余池中的每个合格单词继续链。然后,选择最长的此类连续。

答案 6 :(得分:2)

对于这个问题,我有一个基于树的方法,可能更快。我仍在执行代码,但是这是我要做的:

    1. Form a tree with the root node as first word. 
    2. Form the branches if there is any word or words that starts 
with the alphabet with which this current word ends.
    3. Exhaust the entire given list based on the ending alphabet
 of current word and form the entire tree.
    4. Now just find the longest path of this tree and store it.
    5. Repeat steps 1 to 4 for each of the words given in the list
 and print the longest path among the longest paths we got above.

我希望在给出大量单词的情况下,这可能会提供更好的解决方案。我将用实际的代码实现对此进行更新。

答案 7 :(得分:1)

希望这是一种更直观的方法,无需递归。遍历列表,让Python的排序和列表理解为您完成工作:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chain_longest(pivot, words):
    new_words = []
    new_words.append(pivot)
    for word in words:
        potential_words = [i for i in words if i.startswith(pivot[-1]) and i not in new_words]
        if potential_words:
            next_word = sorted(potential_words, key = lambda x: len)[0]
            new_words.append(next_word)
            pivot = next_word
        else:
            pass
    return new_words

max([chain_longest(i, words) for i in words], key = len)
>>
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

设置一个中心点,并检查potential_words是否以您的中心词开头并且没有出现在新的单词列表中。如果找到,则按长度排序,并采用第一个元素。

列表理解以每个词为中心,并返回最长的链。

答案 8 :(得分:1)

使用递归方法的另一个答案:

def word_list(w_list, remaining_list):
    max_result_len=0
    res = w_list
    for word_index in range(len(remaining_list)):
        # if the last letter of the word list is equal to the first letter of the word
        if w_list[-1][-1] == remaining_list[word_index][0]:
            # make copies of the lists to not alter it in the caller function
            w_list_copy = w_list.copy()
            remaining_list_copy = remaining_list.copy()
            # removes the used word from the remaining list
            remaining_list_copy.pop(word_index)
            # append the matching word to the new word list
            w_list_copy.append(remaining_list[word_index])
            res_aux = word_list(w_list_copy, remaining_list_copy)
            # Keep only the longest list
            res = res_aux if len(res_aux) > max_result_len else res 
    return res

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
word_list(['dog'], words)

输出:

['dog', 'giraffe', 'elephant', 'tiger', 'racoon']