找到最长前缀出现在"随机"列表中的至少2个元素上。字符串

时间:2017-11-02 18:17:55

标签: python string python-3.x list prefix

给出一个字符串列表,例如:

myList = ["foo", "foobar", "football", "footbag", "bar"]

找到列表中至少2个字符串上的最长前缀:

Longest prefix is "footba" present in "football" and "footbag"

列表将通过输入填充,并非所有列表都具有公共前缀。

要被视为一个选项,列表中的两个字符串上的前缀就足够了。如果有多个选项,则必须返回最长的选项。

在我的研究中,我已经能够找到如何在所有字符串上获得最长公共前缀,例如:

列表:["foo_a","foo_b","foo_c","fnord"]

输出:Longest common prefix is "f"

但是,我的列表中的字符串可能甚至没有以相同的字母开头。

2 个答案:

答案 0 :(得分:2)

您可以构建prefix trie s的森林,然后搜索" height" (具有两个(非空)子节点的最深节点(距离根有多远)。此节点表示最长的公共前缀。

如果您不关心性能,可以简单地迭代列表中的所有单词,并将其中的每个单词(其前缀)与其余单词进行比较,同时保持更新最大值:

def common_prefix_size(s1, s2):
    res, i = 0, 0
    while i < min(len(s1), len(s2)):
        if s1[i] == s2[i]:
            res += 1
            i += 1
        else:
            break
    return res



def longest_prefix(lst):
    res = ''
    maxsize = 0
    for i in range(len(lst) - 1):
        for j in range(i + 1, len(lst)):
            t = common_prefix_size(lst[i], lst[j])
            maxsize = max(maxsize, t)
            if maxsize == t:
                res = lst[i][:maxsize]
    return res

myList = ["foo", "foobar", "football", "footbag", "bar"]

print(longest_prefix(myList)) # footba

答案 1 :(得分:1)

这是一个混乱的实现,对于大型列表不会有效,但它可以完成工作。我建议你查看前面提到的前缀尝试,如果你有一点时间,他们会更好地工作。

这可以从完整大小的单词向后工作,直到两个单词共享相同的前缀。它会截断单词的结尾并计算同一个单词出现的次数,如果它至少为2则返回它。

from collections import defaultdict

def list_to_text(x):
    x = list(map(str, x))
    first = '", "'.join(x[:-1]) #Join together (a, b, c)
    if first:
        return '" and "'.join((first, x[-1])) #Add the last element (a, b, c and d)
    return x[0] #Return a single value if list length is 1

def find_longest_prefix(x):
    x_max = len(max(x, key=len))
    for i in range(1, x_max)[::-1]:

        #Chop off the end of every word
        trim = [j[:i] for j in x]

        #Iterate through every unique value
        result = defaultdict(list)
        for j in set(trim):
            result[trim.count(j)].append(j)

        #Finish iterating if there are more than 2 words that share a prefix
        highest_count = max(result)
        if highest_count >= 2:
            prefix = result[highest_count]
            words = [j for k in prefix for j in x if j.startswith(k)]
            return prefix, words

myList = ["foo", "foobar", "football", "footbag", "bar"]
prefix, words = find_longest_prefix(myList)

#Put together the string
print('The longest common prefix{} "{}" present in "{}".'.format(' is' if len(prefix) ==1 else 'es are',
                                                                 list_to_text(prefix), list_to_text(words)))

它会根据结果的数量格式化字符串。您的列表仍将导致:

The longest common prefix is "footba" present in "football" and "footbag".

但是添加具有相同长度和结果数量的另一个前缀将导致类似这样的内容:

The longest common prefixes are "footba" and "testin" present in "football", "footbag", "testing" and "testin".