Question

我有一个类似的列表：

Cat
Cat eats
Cat eats food
Dog ears
Dog ears listen
Rabbit

但是我想创建一个只接受基本字符串的新列表，所以在这种情况下：

Cat
Dog ears
Rabbit

我想我可以做一个嵌套的for循环：

for each item in the list
    for each "other" item in the list
        if the item is a sub-string of "other"
            remove "other" from the list)

...但这是一个非常大的数据集，所以我想知道是否有人提出了比 O（n ^ 2）攻击更有效的想法。

编辑：这不是关于编码语法/有错误的问题 - 我知道如何编码我上面建议的内容。这是一个问题，如果有一个逻辑方法来做到这一点，而不必使用嵌套的for循环，因为这是低效的。

Answer 1

您可以对数据进行分组，并找到每个组的最低列表：

import itertools
s = """
Cat
Cat eats
Cat eats food
Dog ears
Dog ears listen
Rabbit
"""
listings = list(map(str.split, filter(None, s.split('\n'))))
new_s = [' '.join(min((list(b)), key=len)) for a, b in itertools.groupby(listings, key=lambda x:x[0])]

输出：

['Cat', 'Dog ears', 'Rabbit']

Answer 2

首先，对列表进行排序。现在，维护一个知道最新基本短语的变量。忽略任何匹配的东西。当您发现不同的内容时，请将其放在基本短语列表中并更改该变量。

base = sorted_list[0]
new_list = [base]

for phrase in sorted_list:
    if not phrase.startswith(base):
        new_list.append(phrase)
        base = phrase

有效地创建仅包含Python

2 个答案: