Question

我正在寻找基于列表中已有的子串减少给定列表的最有效方法。

例如

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

将简化为：

mylist = ['abcd','qrs']

因为'abcd'和'qrs'都是该列表中其他元素的最小子字符串。我用大约30行代码就可以做到这一点，但我怀疑那里有一个狡猾的单行代码..

Answer 1

这似乎有效（但我认为效率不高）

def reduce_prefixes(strings):
    sorted_strings = sorted(strings)
    return [element
            for index, element in enumerate(sorted_strings)
            if all(not previous.startswith(element) and
                   not element.startswith(previous)
                   for previous in sorted_strings[:index])]

测试：

>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu',
                    'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']

Answer 2

一种解决方案是迭代所有字符串并根据它们是否具有不同的字符进行拆分，并递归地应用该函数。

def reduce_substrings(strings):
    return list(_reduce_substrings(map(iter, strings)))

def _reduce_substrings(strings):
    # A dictionary of characters to a list of strings that begin with that character
    nexts = {}
    for string in strings:
        try:
            nexts.setdefault(next(string), []).append(string)
        except StopIteration:
            # Reached the end of this string. It is the only shortest substring.
            yield ''
            return
    for next_char, next_strings in nexts.items():
        for next_substrings in _reduce_substrings(next_strings):
            yield next_char + next_substrings

这会根据字符将其拆分为字典，并尝试从字典中分割成不同列表的子字符串中找出最短的子字符串。

当然，由于这个函数的递归性质，单行代码不可能有效。

Answer 3

可能不是最有效的，但至少是短期的：

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)

Answer 4

试试这个：

import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
    if re.match("^abcd$",i):
        new_list.append(i)
    elif re.match("^qrs$",i):
        new_list.append(i)
print(new_list)
#['abcd', 'qrs']

根据元素子串减少列表

4 个答案: