Question

我有一个列表l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
我想删除以相同子字符串开头的元素（如果存在）（在这种情况下为'abcd'和'ghi'）。
NB：根据我的情况，我知道“重复的”元素（如果存在）只能是“ abcd”或“ ghi”。
要删除它们，我使用了：

>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
>>> l.remove('ghi') if ('ghijklm' in l and 'ghi' in l) else l
>>> l
>>> ['abcdef', 'ghijklm', 'xyz', 'pqrs']

是否有一种更有效的（或更自动化的方式）来做到这一点？

Answer 1

您可以在线性时间和O（n *m²）内存（其中m是元素的长度）中进行操作：

prefixes = {}
for word in l:
    for x in range(len(word) - 1):
        prefixes[word[:x]] = True

result = [word for word in l if word not in prefixes]

遍历每个单词，并创建一个字典，每个单词的第一个字符，然后是前两个字符，然后是三个，一直到单词的所有字符（最后一个除外）。然后再次遍历列表，如果单词出现在该词典中，则是列表中其他单词的较短子集

Answer 2

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']

for a in l[:]:
    for b in l[:]:
        if a.startswith(b) and a != b:
            l.remove(b)
print(l)

输出

['abcdef', 'ghijklm', 'xyz', 'pqrs']

Answer 3

尝试一下，它将起作用

// This function is a factory. When called, creates and returns a new object every time
function ourFactoryFn (firstName, lastName) {
    var a = {
        prop1:  firstName,
        prop2: lastName,
        prop3: firstName + ' ' + lastName + ' says Hello world!'
    }
    return a;
};

// Now, let's use our factory to produce new objects
// let's actually have an example to treat it like real life factories :P
var inputArr = [
    {firstName: 'Barack', lastName: 'Obama'},
    {firstName: 'Narendra', lastName: 'Modi'},
    {firstName: 'Mike', lastName: 'Tyson'},
    {firstName: 'Mahatma', lastName: 'Gandhi'},
    {firstName: 'Donald', lastName: 'Trump'},
    {firstName: 'Priyanka', lastName: 'Chopra'}
];
var outputArr = [];
inputArr.forEach(function (x) {
    var newObj = ourFactoryFn(x.firstName, x.lastName); // we used our factory
    console.dir(newObj); // print the freshly created object
    outputArr.push(newObj);
});

Answer 4

下面的代码执行您所描述的。

your_list = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
print("Original list: %s" % your_list)
helper_list = []
for element in your_list:
    for element2 in your_list:
        if element.startswith(element2) and element != element2:
            print("%s starts with %s" % (element, element2))
            print("Remove: %s" % element)
            your_list.remove(element)
print("Removed list: %s" % your_list)

输出：

Original list: ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
abcdef starts with abcd
Remove: abcdef
ghijklm starts with ghi
Remove: ghijklm
Removed list: ['abcd', 'ghi', 'xyz', 'pqrs']

另一方面，我认为有更简单的解决方案，并且您可以根据需要使用列表理解来解决。

Answer 5

@安德鲁·艾伦（Andrew Allen）的方式

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
i=0
l = sorted(l)
while True:
 try:
  if l[i] in l[i+1]:
   l.remove(l[i])
   continue
  i += 1
 except:
  break
print(l)
#['abcdef', 'ghijklm', 'pqrs', 'xyz']

Answer 6

您可以使用

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
if "abcdef" in l:  # only 1 check for containment instead of 2
    l = [x for x in l if x != "abcd"]  # to remove _all_ abcd
    # or
    l = l.remove("abcd")               # if you know there is only one abcd in it

这可能会稍快一些（如果您要显示的元素更多），因为您只需要检查一次“ abcdef”，然后检查一次，直到列表的第一个/全部要替换。

>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l

对l进行两次完整大小检查以检查密闭性（如果不幸的话），然后仍然需要从中删除一些东西

免责声明：
如果这不是经过验证，衡量的瓶颈或对安全性至关重要的。除非我没有的建议，否则我不会去做。我的测量结果表明这是最大的节省时间/优化时间全部代码 ...最多列出几十个/几百个列表（肚子感觉-您的数据不支持任何分析），由此估算的收益微不足道。

删除以相同子字符串开头的第二项

6 个答案: