假设您有一些昂贵的,占用大量CPU的功能,例如解析xml字符串。在这种情况下,我们的琐碎功能将是:
def parse(foo):
return int(foo)
作为输入,您有一个字符串列表,您想解析它们并找到满足某些条件的已解析字符串的子集。理想情况下,我们希望每个字符串仅执行一次解析。
没有列表理解,您可以:
olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
new = parse(old) # First and only Parse
if new > 3:
news.append(new)
要以列表理解的方式执行此操作,似乎您必须执行两次解析,一次是获取新值,一次是执行条件检查:
olds = ["1", "2", "3", "4", "5"]
news = [
parse(new) # First Parse
for new in olds
if parse(new) > 3 # Second Parse
]
例如,此语法将不起作用:
olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]
使用发电机似乎可行:
def parse(strings):
for string in strings:
yield int(string)
olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]
不过,您可以将条件语句放入生成器中
def parse(strings):
for string in strings:
val = int(string)
if val > 3:
yield val
olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]
我想知道的是,就优化(不是可重用性等)而言,哪一种更好,一种是在生成器中进行解析,而有条件检查是在列表理解中进行,另一种则在解析和条件检查都在生成器中进行?有没有比这两种方法更好的替代方法?
以下是Python 3.6.5中dis.dis
的一些输出。请注意,在我的Python版本中,为了分解列表推导,我们必须使用f.__code__.co_consts[1]
。检查此answer以获得解释。
def parse(strings):
for string in strings:
yield int(string)
def main(strings):
return [i for i in parse(strings) if i > 3]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (3)
12 COMPARE_OP 4 (>)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 SETUP_LOOP 22 (to 24)
2 LOAD_FAST 0 (strings)
4 GET_ITER
>> 6 FOR_ITER 14 (to 22)
8 STORE_FAST 1 (string)
3 10 LOAD_GLOBAL 0 (int)
12 LOAD_FAST 1 (string)
14 CALL_FUNCTION 1
16 YIELD_VALUE
18 POP_TOP
20 JUMP_ABSOLUTE 6
>> 22 POP_BLOCK
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE
"""
def parse(strings):
for string in strings:
val = int(string)
if val > 3:
yield val
def main(strings):
return [i for i in parse(strings)]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 SETUP_LOOP 34 (to 36)
2 LOAD_FAST 0 (strings)
4 GET_ITER
>> 6 FOR_ITER 26 (to 34)
8 STORE_FAST 1 (string)
3 10 LOAD_GLOBAL 0 (int)
12 LOAD_FAST 1 (string)
14 CALL_FUNCTION 1
16 STORE_FAST 2 (val)
4 18 LOAD_FAST 2 (val)
20 LOAD_CONST 1 (3)
22 COMPARE_OP 4 (>)
24 POP_JUMP_IF_FALSE 6
5 26 LOAD_FAST 2 (val)
28 YIELD_VALUE
30 POP_TOP
32 JUMP_ABSOLUTE 6
>> 34 POP_BLOCK
>> 36 LOAD_CONST 0 (None)
38 RETURN_VALUE
def parse(string):
return int(string)
def main(strings):
values = []
for string in strings:
value = parse(string)
if value > 3:
values.append(value)
return values
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main)
"""
2 0 BUILD_LIST 0
2 STORE_FAST 1 (values)
3 4 SETUP_LOOP 38 (to 44)
6 LOAD_FAST 0 (strings)
8 GET_ITER
>> 10 FOR_ITER 30 (to 42)
12 STORE_FAST 2 (string)
4 14 LOAD_GLOBAL 0 (parse)
16 LOAD_FAST 2 (string)
18 CALL_FUNCTION 1
20 STORE_FAST 3 (value)
5 22 LOAD_FAST 3 (value)
24 LOAD_CONST 1 (3)
26 COMPARE_OP 4 (>)
28 POP_JUMP_IF_FALSE 10
6 30 LOAD_FAST 1 (values)
32 LOAD_ATTR 1 (append)
34 LOAD_FAST 3 (value)
36 CALL_FUNCTION 1
38 POP_TOP
40 JUMP_ABSOLUTE 10
>> 42 POP_BLOCK
7 >> 44 LOAD_FAST 1 (values)
46 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_FAST 0 (string)
4 CALL_FUNCTION 1
6 RETURN_VALUE
"""
请注意,前两个使用表生成器使用列表推导的反汇编如何指示两个for循环,一个在主循环(列表推导)中,一个在解析(生成器)中。这并不像听起来那样糟糕,对吗?例如,整个操作是O(n)而不是O(n ^ 2)吗?
def parse(string):
return int(string)
def main(strings):
return [val for val in (parse(string) for string in strings) if val > 3]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (val)
8 LOAD_FAST 1 (val)
10 LOAD_CONST 0 (3)
12 COMPARE_OP 4 (>)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (val)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_FAST 0 (string)
4 CALL_FUNCTION 1
6 RETURN_VALUE
"""
答案 0 :(得分:2)
我认为您可以比您想象的更简单:
olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]
或者只是:
news = [new for new in map(parse, olds) if new > 3]
这两种方式parse
每次被调用一次。