如果前一个字符与列表中的另一个字符串元素匹配,请删除字符串列表中的字符串元素

时间:2019-06-24 12:59:10

标签: python list

我要查找并比较 列表中的字符串元素,然后删除列表中其他字符串元素的组成部分(具有相同的起点) )

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

我打算得到一个看起来像这样的列表:

list2 = [  'green apples are worse' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

换句话说,我想保留最长的字符串元素,这些元素以相同的第一个字符开头。

3 个答案:

答案 0 :(得分:3)

这是您可以实现的一种方式:-

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']
list2 = []
for i in list1:
    bool = True
    for j in list1:
        if id(i) != id(j) and j.startswith(i): bool = False
    if bool: list2.append(i)
>>> list2
['green apples are worse', ' this is another sentence ', 'a boy ran towards the mill and fell']

答案 1 :(得分:3)

根据John Coleman in comments的建议,您可以先对句子进行排序,然后比较连续的句子。如果一个句子是另一个句子的前缀,它将出现在排序列表中该句子的前面,因此我们只需要比较连续的句子即可。要保留原始顺序,可以使用set快速查找过滤后的元素。

list1 = ['a boy ran', 'green apples are worse', 
         'a boy ran towards the mill', ' this is another sentence ',
         'a boy ran towards the mill and fell']                                                                

srtd = sorted(list1)
filtered = set(list1)
for a, b in zip(srtd, srtd[1:]):
    if b.startswith(a):
        filtered.remove(a)

list2 = [x for x in list1 if x in filtered]                                     

然后,list2是以下内容:

['green apples are worse',
 ' this is another sentence ',
 'a boy ran towards the mill and fell']

使用O(nlogn),这比比较O(n²)中的所有成对句子要快得多,但是如果列表不是太长,则Vicrobot更为简单的解决方案也将起作用。 / p>

答案 2 :(得分:3)

您对要如何处理['a','ab','ac','add']的问题的措辞有点含糊不清。我假设您想要['ab','ac','add']

以下内容还假定您没有任何空字符串。那不是一个很好的假设。

基本上,我们是根据输入值构建一棵树,并且仅保留叶节点。这可能是最复杂的方法。我认为它有可能成为效率最高的 ,但是我不确定,这也不是您所要求的。

from collections import defaultdict
from itertools import groupby
from typing import Collection, Dict, Generator, Iterable, List, Union

# Exploded is a recursive data type representing a culled list of strings as a tree of character-by-character common prefixes. The leaves are the non-common suffixes.
Exploded = Dict[str, Union["Exploded", str]]

def explode(subject:Iterable[str])->Exploded:
    heads_to_tails = defaultdict(list)
    for s in subject:
        if s:
            heads_to_tails[s[0]].append(s[1:])
    return {
        head: prune_or_follow(tails)
        for (head, tails)
        in heads_to_tails.items()
    }

def prune_or_follow(tails: List[str]) -> Union[Exploded, str]:
    if 1 < len(tails):
        return explode(tails)
    else: #we just assume it's not empty.
        return tails[0]

def implode(tree: Exploded, prefix :Iterable[str] = ()) -> Generator[str, None, None]:
    for (head, continued) in tree.items():
        if isinstance(continued, str):
            yield ''.join((*prefix, head, continued))
        else:
            yield from implode(continued, (*prefix, head))

def cull(subject: Iterable[str]) -> Collection[str]:
    return list(implode(explode(subject)))

print(cull(['a','ab','ac','add']))
print(cull([ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']))
print(cull(['a', 'ab', 'ac', 'b', 'add']))

编辑:
我整理了一些电话,希望这种方式更易于阅读和推理。 让我感到困惑的是,我无法弄清楚该过程的运行时复杂性。我认为是O(nm),其中m是重叠前缀的长度,与O(nm log(n))进行字符串比较相比……

编辑:
我从“代码审阅”开始this other question,希望有人可以帮助我稍微了解一下复杂性。那里的人指出,所编写的代码实际上并不起作用:groupby对其名称的任何理智解释都是垃圾。我已经换了上面的代码,用这种方式也更容易阅读。

编辑:
好的,我已经为CR导入了一些很棒的建议。在这一点上,我可以肯定的是,运行时复杂性要比基于排序的选项要好。