python代码的时间复杂度,用于查找可以由列表中的其他单词组成的最长单词(后续)

时间:2016-02-17 16:59:09

标签: python algorithm recursion time-complexity memoization

我知道已经提出并解决了类似的问题(Time complexity of python code for finding the longest word that can be made of other words in the list),但我有一个后续问题。

我有以下问题,

# Given a list of words, find the longest word made of other words in the list.
# Example: [cat, banana, dog, nana, walk, walker, dogwalker]
# Result: dogwalker

我在Python 3中有以下实现:

def find_longest_word(list_words):
    list_words = sorted(list_words, key=lambda x:len(x))[::-1]
    for elem in list_words:
        set_remaining = set(list_words) - set([elem])
        if find_if_composite(elem, set_remaining, {}):
            return elem
    return None

def find_if_composite(s, set_):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite(s[i:], set_): # assuming that I can reuse the string
                return True
    return False

我们从前面的回答中知道这段代码在O(N!)中运行,其中N是字符串的大小(如果我是正确的话)。

我的问题是:有没有办法改善时间复杂度(例如,使用memoization)?如果没有,为什么?如果有,怎么样?我尝试了以下代码,但似乎它在递归调用期间从未命中过备忘录。

def find_if_composite_memo(s, set_, memo):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                return True
    memo[s] = False
    return False

要测试,请使用

b = ["cat", "banana", "dog", "nana", "walk", "walker", "dogwalker", "dogwalkerbanana"]
print(find_longest_word(b))

1 个答案:

答案 0 :(得分:0)

只有在拥有完整解决方案时才能使用备忘录。 由于您按长度顺序对列表进行排序,因此您的第一个完整解决方案是所需的答案。 因此,您永远不会使用您的备忘录。

我在你的memoization例程中添加了一些跟踪,并稍微增强了测试:

def find_if_composite_memo(s, set_, memo):
    print "ENTER", s, '\t', memo
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        print "Found", s, memo[s]
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                print "Add", memo
                return True
    memo[s] = False
    return False

b = ["cat", "banana", "dog", "nana",
     "walk", "walker",
     "dogcat", "dogwalker",
     "dogwalkerbanana",
     "catwalkerbanana-FAIL"]
print(find_longest_word(b))

输出:

ENTER catwalkerbanana-FAIL  {}
ENTER walkerbanana-FAIL     {}
ENTER erbanana-FAIL     {}
ENTER banana-FAIL   {'erbanana-FAIL': False}
ENTER -FAIL     {'erbanana-FAIL': False}
ENTER dogwalkerbanana   {}
ENTER walkerbanana  {}
ENTER erbanana  {}
ENTER banana    {'erbanana': False}
Add {'walkerbanana': True, 'erbanana': False, 'banana': True}
Add {'walkerbanana': True, 'erbanana': False, 'dogwalkerbanana': True, 'banana': True}
dogwalkerbanana

你看到了问题吗?通过" catwalkerbanana-FAIL",你有可能记录"时装秀"," catwalker"和" walkerbanana",但不要'利用这一点。最后一个可以节省一些搜索时间。

至于改进整体算法,您可以考虑SE的code review网站。

如果您使用以下某些功能,我希望您可以做得更好:

(1) When you fail to find the present top-level word **elem** as a composite, you can remove it from the word list entirely.  Don't keep it in set_remaining: since it's at least as long as any other word in the list, it cannot be a component.
(2) In **find_if_composite**, you go through a lot of pain to determine whether any beginning substring of **s** is in the word_list.  I think you can speed this up: use **s.startswith(word)** on each word in the list.  
(3) If the list has enough composite hits to matter, examine your memoization algorithm carefully.  Figure out what is and is not worth saving, and write the pseudo-code to handle only the time-saving parts.  Remember that, in working from largest to smallest, you're going to short-circuit a lot of cases.  In this test set, though, finding "walkerbanana" the first time seen would save a recursion later.

同时寻找使用其他string operations的好地方,例如索引(子字符串搜索)。