过滤器或发电机出乎意料的结果

时间:2014-10-22 18:00:25

标签: python python-3.x generator-expression

这是一个有趣的。当我使用filter或生成器发现一些意外结果时,我实际上正在为another question写一个答案。我有一个文件路径列表:

paths = ['/directoryb/baba.txt', '/directorya/nigel.txt', '/directoryb/ralph.txt', '/directorya/jim.txt'

我在路径列表中创建了一组不同的目录:

from os.path import dirname
dirs = {dirname(path) for path in paths}

现在我想制作一个生成器列表(甚至是生成器生成器),每个生成器都包含同一目录中paths的元素。所以我这样做:

dirs_iter = [(path for path in paths if path.startswith(dir)) for dir in dirs]

跑步后我没有惊讶:

for dir_iter in dirs_iter:
    for path in dir_iter:
        print(path)

获得以下内容:

/directorya/nigel.txt
/directorya/jim.txt
/directorya/nigel.txt
/directorya/jim.txt

这显然是错误的。然而,如果我使用以下句子:

# now I'm generating the lists instead of using generators
dirs_iter = [[path for path in paths if path.startswith(dir)] for dir in dirs]

打印循环显示预期答案:

/directoryb/baba.txt
/directoryb/ralph.txt
/directorya/nigel.txt
/directorya/jim.txt

如果我使用filter和/或map代替生成器:

dirs_iter = map(lambda dir: filter(lambda path: path.startswith(dir), paths), dirs)

我的答案也错了 编辑: map / filter版本确实有效。

这里发生了什么?

1 个答案:

答案 0 :(得分:2)

名称dir是一个闭包,在执行生成器时查找,而不是在定义它时。到那时dir最后绑定到dirs中的最后一个值:

>>> from os.path import dirname
>>> paths = ['/directoryb/baba.txt', '/directorya/nigel.txt', '/directoryb/ralph.txt', '/directorya/jim.txt']
>>> dirs = {dirname(path) for path in paths}
>>> def echo(value):
...     print('echoing:', value)
...     return value
... 
>>> dirs_iter = [(path for path in paths if path.startswith(echo(dir))) for dir in dirs]
>>> for dir_iter in dirs_iter:
...     print('Iterating over the next dir_iter generator')
...     for path in dir_iter:
...         print(path)
... 
Iterating over the next dir_iter generator
echoing: /directoryb
/directoryb/baba.txt
echoing: /directoryb
echoing: /directoryb
/directoryb/ralph.txt
echoing: /directoryb
Iterating over the next dir_iter generator
echoing: /directoryb
/directoryb/baba.txt
echoing: /directoryb
echoing: /directoryb
/directoryb/ralph.txt
echoing: /directoryb
>>> list(dirs)
['/directorya', '/directoryb']

因为Python 3使用随机散列种子,所以在我的运行中/directoryb是最后一次而不是/directorya,但只有当我们实际迭代时,才会看到{ {1}}生成器dir_iter值被访问(并回显),并且在那时它被设置为一个值。 dir行显示list(dirs)集合以什么顺序产生其值。

请注意,dirs 会出现此问题;您的filter()map()组合工作正常:

filter()