Question

出于不相关的原因，我以某种方式组合了一些数据结构，同时还用dict替换了Python 2.7的默认OrderedDict。数据结构使用元组作为字典中的键。请忽略这些详细信息（替换dict类型在下面没有用，但是在真实代码中）。

import __builtin__
import collections
import contextlib
import itertools


def combine(config_a, config_b):
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))


@contextlib.contextmanager
def dict_as_ordereddict():
    dict_orig = __builtin__.dict
    try:
        __builtin__.dict = collections.OrderedDict
        yield
    finally:
        __builtin__.dict = dict_orig

最初可以按预期运行（dict可以采用非字符串关键字参数作为特例）：

print 'one level nesting'
with dict_as_ordereddict():
    result = combine(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    )
print list(result)
print

输出：

one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]

但是，当嵌套调用combine生成器表达式时，可以看到dict引用被视为OrderedDict，缺少dict的特殊行为使用元组作为关键字参数：

print 'two level nesting'
with dict_as_ordereddict():
    result = combine(combine(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    ),
        [{(8, 9): 'e', (10, 11): 'f'}]
    )
print list(result)
print

输出：

two level nesting
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    [{(8, 9): 'e', (10, 11): 'f'}]
  File "test.py", line 8, in combine
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
  File "test.py", line 8, in <genexpr>
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings

此外，通过yield而不是生成器表达式实现可解决此问题：

def combine_yield(config_a, config_b):
    for first, second in itertools.product(config_a, config_b):
        yield dict(first, **second)


print 'two level nesting, yield'
with dict_as_ordereddict():
    result = combine_yield(combine_yield(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    ),
        [{(8, 9): 'e', (10, 11): 'f'}]
    )
print list(result)
print

输出：

two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]

问题：

为什么在第二个示例中需要先评估生成器表达式中的某些项（只有第一个？），或者需要什么？
为什么在第一个示例中未对其进行评估？我实际上在两种情况下都期望这种行为。
为什么基于yield的版本有效？

Answer 1

在进入细节之前，请注意以下几点：itertools.product评估迭代器参数以计算乘积。这可以从文档中等效的Python实现中看到（第一行是相关的）：

def product(*args, **kwds):
    pools = map(tuple, args) * kwds.get('repeat', 1)
    ...

您还可以使用自定义类和简短的测试脚本进行尝试：

import itertools


class Test:
    def __init__(self):
        self.x = 0

    def __iter__(self):
        return self

    def next(self):
        print('next item requested')
        if self.x < 5:
            self.x += 1
            return self.x
        raise StopIteration()


t = Test()
itertools.product(t, t)

创建itertools.product对象将在输出中显示立即请求所有迭代器项。

这意味着，一旦调用itertools.product，迭代器参数就会被评估。这很重要，因为在第一种情况下，参数仅是两个列表，因此没有问题。然后，在上下文管理器result返回后，通过list(result 评估最终的dict_as_ordereddict，因此所有对dict的调用都将被解析为正常内置dict。

对于第二个示例，现在对combine的内部调用仍然可以正常工作，现在返回一个生成器表达式，该表达式随后用作第二个combine对{{1 }}。正如我们在上面看到的，这些参数被立即求值，因此要求generator对象生成其值。为此，它需要解析itertools.product。但是，现在我们仍然在上下文管理器dict中，因此dict_as_ordereddict将被解析为dict，它不接受关键字参数的非字符串键。

请务必注意，使用OrderedDict的第一个版本需要创建生成器对象才能返回它。这涉及创建return对象。这意味着此版本像itertools.product一样懒。

现在要问为什么itertools.product版本有效的问题。通过使用yield，调用该函数将返回一个生成器。现在这是一个真正的惰性版本，从某种意义上说，只有在请求项之前函数主体的执行才开始。这意味着对yield的内部或外部调用都不会开始执行函数主体，因此不会调用convert，直到通过itertools.product请求这些项为止。您可以通过在该函数内并在上下文管理器后面放置一个附加的print语句来进行检查：

list(result)

在def combine(config_a, config_b): print 'start' # return (dict(first, **second) for first, second in itertools.product(config_a, config_b)) for first, second in itertools.product(config_a, config_b): yield dict(first, **second) with dict_as_ordereddict(): result = combine(combine( [{(0, 1): 'a', (2, 3): 'b'}], [{(4, 5): 'c', (6, 7): 'd'}] ), [{(8, 9): 'e', (10, 11): 'f'}] ) print 'end of context manager' print list(result) print版本中，我们会注意到它打印以下内容：

yield

即仅当通过end of context manager start start请求结果时，才启动生成器。这与list(result)版本（上面的代码中的注释）不同。现在您会看到

return

并且在上下文管理器结束之前，已经引发了错误。

顺便说一句，为了使您的代码正常工作，start start的替换必须无效（这是第一个版本），所以我不明白为什么要使用该上下文经理。其次，dict文字在Python 2中没有排序，关键字参数也不是，因此也违反了使用dict的目的。还要注意，在Python 3中，OrderedDict的非字符串关键字参数行为已被删除，更新任何键的字典的干净方法是使用dict。

生成器表达式与生成器函数以及令人惊讶的渴望评估

1 个答案: