列表理解

Question

我有一个包含多个词典l的列表l = [ d1, d2, ...., d100]，其中每个词典都使用键'id'，'address'，'price'定义。现在，我想从列表d中获取所有词典l，其中键'price'的值等于50.有没有比使用for循环更快的方法？这个处理已经封装在其他for循环函数中，所以如果可能的话，我宁愿不要有两个for循环。该函数的骨架现在看起来像：

for ... (external for loop):
    results = []
    for d in l:
        if d['price'] == 50:
           results.append(d)

Answer 1

您可以使用list comprehension：

results = [d for d in l if d['price'] == 50]

这在算法上与你的循环没有什么不同（它也必须迭代整个列表），但理解在C中优化，因此更快。另一个选择是使results成为一个惰性迭代器：

# generator expression
results = (d for d in l if d['price'] == 50)

# filter (not the most elegant/readable with lambda)
results = filter(lambda d: d['price'] == 50, l)

这不会在声明时迭代list。只有在迭代results时才会这样做（你只能做一次）。如果您不总是或仅部分需要迭代results，这可能会有所帮助。

Answer 2

除非你知道列表的结构（例如按价格排序，或者只有三个项目可以有这样的价格），否则我们不能使算法比 O更快（ n）（线性时间）。所以我们不得不循环。

列表理解

我们可以使用列表理解，例如：

[d for d in l if d.get('price') == 50]

（这也会过滤掉没有price属性的字典。）

熊猫

我们也可以使用熊猫。 Pandas是一个高效的数据库库，由于数据 large ，它往往优于Python循环。在这种情况下，我们可以在数据框中加载字典，对其执行过滤，然后检索字典列表。请注意，这些将是不同的词典（即包含相同数据的其他对象）。因此数据被“复制”。

import pandas as pd
df = pd.DataFrame(l)
result = list(df[df.price == 50].T.to_dict().values())

所以我们在这里使用df.price == 50进行过滤。请注意，在窗帘后面有一些循环涉及进行过滤。

这也是一种更声明性的方法：代码解释了更多它正在做什么，而不是如何。大熊猫如何过滤不是你的问题，而且语法相当优雅地表明你正在过滤数据。

Answer 3

tl; dr - 使用列表推导不会出错

我探索了以下方法：

基本for-loop
使用if-comparison列出理解
内置过滤器函数，带有lambda表达式
列出对生成器的理解

这些方法在Python 2.7.12和Python 3.5.2（不是最新版本）中进行了探索。似乎在Python 2中，最好的方法是方法4，在python 3中，最好的方法是方法2（至少对于我的版本，这也不是最新的）。

以下是 Python 2.7.12 的结果：

# 2.7.12
# [GCC 5.4.0 20160609]
# Method 1 found 496 item in 0.382161 seconds. (basic for-loop)
# Method 2 found 496 item in 0.365456 seconds. (list comprehension)
# Method 3 found 496 item in 0.565614 seconds. (built in filter function)
# Method 4 found 496 item in 0.273335 seconds. (list comprehension over a generator expression)

以下是 Python 3.5.2 的结果：

# 3.5.2 
# [GCC 5.4.0 20160609]
# Method 1 found 493 item in 0.500266 seconds. (basic for-loop)
# Method 2 found 493 item in 0.338361 seconds. (list comprehension)
# Method 3 found 493 item in 0.796027 seconds. (built in filter function)
# Method 4 found 493 item in 0.351668 seconds. (list comprehension over a generator expression)

以下是用于获取结果的代码：

import time
import random
import sys

print(sys.version)

l = []
for i in range(10000):
    d = {'price': random.randint(40, 60), 'id': i}
    l.append(d)

#METHOD 1 - basic for-loop
start = time.time()
for _ in range(1000):
    results = []
    for d in l:
        if d['price'] == 50:
           results.append(d)
end = time.time()
print("Method 1 found {} item in {:f} seconds. (basic for-loop)".format(len(results), (end - start)))

#METHOD 2 - list comp with if statement
start = time.time()
results = []
for _ in range(1000):
    results = []
    results = [d for d in l if d['price'] == 50]
end = time.time()
print("Method 2 found {} item in {:f} seconds. (list comprehension)".format(len(results), (end - start)))

#METHOD 3 - using filter and a lambda expression
start = time.time()
results = []
for _ in range(1000):
    results = []
    results = list(filter(lambda d: d['price'] == 50, l))
end = time.time()
print("Method 3 found {} item in {:f} seconds. (built in filter function)".format(len(results), (end - start)))

#METHOD 4 - list comp over generator expression
start = time.time()
results = []
once = True
for _ in range(1000):
    results = []
    genResults = (d for d in l if d['price'] == 50)
    results = [it for it in genResults]
end = time.time()
print("Method 4 found {} item in {:f} seconds. (list comprehension over a generator expression)".format(len(results), (end - start)))

在词典列表中搜索键值

3 个答案:

列表理解

熊猫