Question

我正在寻找过滤列表中的元素。

比方说，我有一个清单：

listA = ['banana', 'apple', 'appleRed', 'melon_01', 'appleGreen', 'Orange', 'melon_03']
listB = ['apple', 'melon']

现在我需要比较列表并生成一个列表，其中只包含以listB开头的元素名称。

结果应为：

listResult = ['apple', 'appleRed', 'melon_01', 'appleGreen', 'melon_03']

我可以在2 for循环中使用if循环比较。等，

for item in listA:
    for fruit in listB:
        if item.startswith(fruit):
            listResult.append(item)
            break

但是，我想知道是否有任何捷径可用于此操作，因为这可能需要更多时间进行大清单比较。

Answer 1

使用列表推导和any生成器：

[item for item in listA if any(item.startswith(fruit) for fruit in listB)]

或者，正如@DSM正确建议：

[item for item in listA if item.startswith(tuple(listB))]

这比第一个解决方案更快，几乎与@Iguananaut提出的正则表达式解决方案一样快（但更紧凑和可读）：

In [1]: %timeit [item for item in listA if any(item.startswith(fruit) for fruit in listB)]
100000 loops, best of 3: 4.31 us per loop

In [2]: %timeit [item for item in listA if item.startswith(tuple(listB))]
1000000 loops, best of 3: 1.56 us per loop

In [3]: %timeit filter(regex.match, listA)
1000000 loops, best of 3: 1.39 us per loop

Answer 2

如果listB中的项目相对较少，则可以相当有效地将其转换为正则表达式：

import re
regex = re.compile(r'^(?:%s)' % '|'.join(listB))
filter(regex.match, listA)

这是我想到的第一件事，但我认为其他人会有其他想法。

注意，使用列表推导的其他答案当然是完全正确和合理的。我以为你想知道是否有办法让它稍快一些。同样应该强调这个解决方案对于一般情况可能并不总是更快，但在这种情况下它稍微是：

In [9]: %timeit [item for item in listA if any(item.startswith(fruit) for fruit in listB)]
100000 loops, best of 3: 8.17 us per loop

In [10]: %timeit filter(regex.match, listA)
100000 loops, best of 3: 2.62 us per loop

Answer 3

listResult = [ i for i in listA if any( i.startsWith( j ) for j in listB ) ]

在python中比较和过滤列表元素

3 个答案: