Question

在我编写的Python程序中，我在使用for循环和增量变量与列表理解与map(itemgetter)和len()进行比较时计算了字典中的条目名单。使用每种方法需要相同的时间。我做错了什么还是有更好的方法？

这是一个大大简化和缩短的数据结构：

list = [
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'},
  {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'},
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'},
  {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'},
  {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'}
]

以下是for循环版本：

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True

key1 = key2 = 0
for dictionary in list:
    if dictionary["key1"]:
        key1 += 1
        if dictionary["key2"]:
            key2 += 1
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

以上数据的输出：

Counts: key1: 3, subset key2: 2

这是另一个，也许更像Pythonic，版本：

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True
from operator import itemgetter
KEY1 = 0
KEY2 = 1
getentries = itemgetter("key1", "key2")
entries = map(getentries, list)
key1 = len([x for x in entries if x[KEY1]])
key2 = len([x for x in entries if x[KEY1] and x[KEY2]])
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

上述数据的输出（与之前相同）：

Counts: key1: 3, subset key2: 2

我有点惊讶这些花费相同的时间。我想知道是否有更快的东西。我确定我忽视了一些简单的事情。

我考虑过的一个替代方案是将数据加载到数据库并执行SQL查询，但数据不需要持久存在，我必须分析数据传输的开销等，以及数据库可能并不总是可用。

我无法控制数据的原始形式。

^{_{上面的代码不适用于样式点。}}

Answer 1

我认为你通过在很多开销中淹没要测量的代码来测量错误（在顶层模块级而不是在函数中运行，执行输出）。将两个片段放入名为forloop和withmap的函数中，并在列表的定义中添加* 100（在结束]之后），以使测量有点大，我看，在我的慢速笔记本电脑上：

$ py26 -mtimeit -s'import co' 'co.forloop()'
10000 loops, best of 3: 202 usec per loop
$ py26 -mtimeit -s'import co' 'co.withmap()'
10 loops, best of 3: 601 usec per loop

即，map所谓的“更多pythonic”方法比普通for方法慢三倍 - 它告诉你它不是真正“更pythonic”; - ）。

优秀的Python的标志是简单，对我来说，推荐我所骄傲的名字......：

def thebest():
  entries = [d['key2'] for d in list if d['key1']]
  return len(entries), sum(entries)

，在测量时，可以比forloop方法节省10％到20％的时间。

计算词典列表中的条目：for loop with list comprehension with map（itemgetter）

1 个答案: