列出理解和生成器,以避免在使用条件表达式时两次计算相同的值

时间:2019-11-15 20:10:53

标签: python python-3.x generator list-comprehension bytecode

假设您有一些昂贵的,占用大量CPU的功能,例如解析xml字符串。在这种情况下,我们的琐碎功能将是:

def parse(foo):
    return int(foo)

作为输入,您有一个字符串列表,您想解析它们并找到满足某些条件的已解析字符串的子集。理想情况下,我们希望每个字符串仅执行一次解析。

没有列表理解,您可以:

olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
    new = parse(old)      # First and only Parse
    if new > 3:
        news.append(new)

要以列表理解的方式执行此操作,似乎您必须执行两次解析,一次是获取新值,一次是执行条件检查:

olds = ["1", "2", "3", "4", "5"]
news = [
    parse(new)         # First Parse
    for new in olds
    if parse(new) > 3  # Second Parse
]

例如,此语法将不起作用:

olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]

使用发电机似乎可行:

def parse(strings):
    for string in strings:
        yield int(string)

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]

不过,您可以将条件语句放入生成器中

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]

我想知道的是,就优化(不是可重用性等)而言,哪一种更好,一种是在生成器中进行解析,而有条件检查是在列表理解中进行,另一种则在解析和条件检查都在生成器中进行?有没有比这两种方法更好的替代方法?


以下是Python 3.6.5中dis.dis的一些输出。请注意,在我的Python版本中,为了分解列表推导,我们必须使用f.__code__.co_consts[1]。检查此answer以获得解释。

生成器进行分析,列表理解进行条件检查

def parse(strings):
    for string in strings:
        yield int(string)

def main(strings):
    return [i for i in parse(strings) if i > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (i)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 SETUP_LOOP              22 (to 24)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                14 (to 22)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE            6
        >>   22 POP_BLOCK
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE
"""

Generator同时进行解析和条件检查

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

def main(strings):
    return [i for i in parse(strings)]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
"""
dis.dis(parse)
"""
  2           0 SETUP_LOOP              34 (to 36)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                26 (to 34)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 STORE_FAST               2 (val)

  4          18 LOAD_FAST                2 (val)
             20 LOAD_CONST               1 (3)
             22 COMPARE_OP               4 (>)
             24 POP_JUMP_IF_FALSE        6

  5          26 LOAD_FAST                2 (val)
             28 YIELD_VALUE
             30 POP_TOP
             32 JUMP_ABSOLUTE            6
        >>   34 POP_BLOCK
        >>   36 LOAD_CONST               0 (None)
             38 RETURN_VALUE

幼稚的紧密循环

def parse(string):
    return int(string)

def main(strings):
    values = []
    for string in strings:
        value = parse(string)
        if value > 3:
            values.append(value)
    return values

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main)
"""
  2           0 BUILD_LIST               0
              2 STORE_FAST               1 (values)

  3           4 SETUP_LOOP              38 (to 44)
              6 LOAD_FAST                0 (strings)
              8 GET_ITER
        >>   10 FOR_ITER                30 (to 42)
             12 STORE_FAST               2 (string)

  4          14 LOAD_GLOBAL              0 (parse)
             16 LOAD_FAST                2 (string)
             18 CALL_FUNCTION            1
             20 STORE_FAST               3 (value)

  5          22 LOAD_FAST                3 (value)
             24 LOAD_CONST               1 (3)
             26 COMPARE_OP               4 (>)
             28 POP_JUMP_IF_FALSE       10

  6          30 LOAD_FAST                1 (values)
             32 LOAD_ATTR                1 (append)
             34 LOAD_FAST                3 (value)
             36 CALL_FUNCTION            1
             38 POP_TOP
             40 JUMP_ABSOLUTE           10
        >>   42 POP_BLOCK

  7     >>   44 LOAD_FAST                1 (values)
             46 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

请注意,前两个使用表生成器使用列表推导的反汇编如何指示两个for循环,一个在主循环(列表推导)中,一个在解析(生成器)中。这并不像听起来那样糟糕,对吗?例如,整个操作是O(n)而不是O(n ^ 2)吗?

编辑:这是凯尔伍德的解决方案:

def parse(string):
    return int(string)

def main(strings):
    return [val for val in (parse(string) for string in strings) if val > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (val)
              8 LOAD_FAST                1 (val)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (val)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

1 个答案:

答案 0 :(得分:2)

我认为您可以比您想象的更简单:

olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]

或者只是:

news = [new for new in map(parse, olds) if new > 3]

这两种方式parse每次被调用一次。