Question

我有疯狂的大目录。我需要通过python获取文件列表。

在代码中我需要获取迭代器，而不是列表。所以这不起作用：

os.listdir
glob.glob  (uses listdir!)
os.walk

我找不到任何好的lib。救命！也许是c ++ lib？

Answer 1

for python 2.X

import scandir
scandir.walk()

for python 3.5 +

os.scandir()

https://www.python.org/dev/peps/pep-0471/

https://pypi.python.org/pypi/scandir

Answer 2

如果你的目录太大而不能让libc readdir（）快速读取它，你可能想查看内核调用getdents（）（ http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html）。我遇到了类似的问题并撰写了一篇关于它的长篇博文。

http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

基本上，readdir（）一次只能读取32K的目录条目，因此如果目录中有很多文件，readdir（）将需要很长时间才能完成。

Answer 3

我认为使用opendir会有效并且有一个python包：http://pypi.python.org/pypi/opendir/0.0.1通过pyrex包装它

Answer 4

你应该使用发电机。这个问题在这里讨论： http://bugs.python.org/issue11406

Answer 5

我发现这个图书馆很有用：https://github.com/benhoyt/scandir。

Answer 6

有人在包裹getdents的那篇文章上构建了一个python模块。顺便说一句，我知道这篇文章已经过时了，但是您可以使用scandir（我已经用2100万个文件的目录完成了此操作）。步行虽然太慢，但速度太慢，但开销太大。

该模块似乎将是一个有趣的选择。尚未使用它，但他的确以上面提到的800万个LS文件为基础。通读代码，认为这样做会很有趣并且使用起来更快。

还允许您调整缓冲区，而不必直接进入C。

https://github.com/ZipFile/python-getdents 而且尽管我建议阅读文档，但也要通过pip和pypi。

https://pypi.org/project/getdents/

Answer 7

我发现这个库真的很快。
https://pypi.org/project/scandir/
我使用了该库中的以下代码，它像个魅力一样工作。

def subdirs(path):
"""Yield directory names not starting with '.' under given path."""
for entry in os.scandir(path):
    if not entry.name.startswith('.') and entry.is_dir():
        yield entry.name

Answer 8

http://docs.python.org/release/2.6.5/library/os.html#os.walk

>>> import os
>>> type(os.walk('/'))
<type 'generator'>

Answer 9

glob.iglob怎么样？它是迭代器glob。

在python上获取大目录文件列表的最佳方法？

9 个答案: