Question

我正在尝试编写一个函数，该函数通过使用下面的os.walk()遍历目录树来查找目录。在我的计算机上，这需要15秒。

for dir_path, dir_names, filenames in os.walk(os.path.expanduser('~')):
    for dir_name in dir_names:
        if dir_name == 'some_dir':
            path = os.path.join(dir_path, dir_name)
            print(path)

我读到os.scandir()更快，所以我在下面尝试了这一点，尽管我认为实现是错误的。它可以工作，但是现在已经快30秒了。

for dir_path, dir_names, filenames in os.walk(os.path.expanduser('~')):
    with os.scandir(dir_path) as entries:
        for entry in entries:
            if entry.name.endswith('some_dir') and entry.is_dir():
                print(entry.path)

如何加快速度？

Answer 1

一个建议是替换import country_converter as coco import pandas as pd # setupt test dataframe df = pd.DataFrame({'code': ['AFG', 'USA', 'RU']}) # add country name by applying the convert method df['short name'] = df.code.apply(lambda x: coco.convert(names=x, to='name_short', not_found=None)) # display(df) code short name 0 AFG Afghanistan 1 USA United States 2 RU Russia # converting the column to a list and passing it to the method also works to add the short name df['short name'] = coco.convert(names=df.code.tolist(), to='name_short', not_found=None)部分：

for dir_name in dir_names

我不知道您是否有很多子目录，但是据此，这应该会使代码更快。

另外，我建议注释掉

当您需要文件类型的其他信息时，for dir_path, dir_names, filenames in os.walk(os.path.expanduser('~'): if 'some_dir' in dir_names: path = os.path.join(dir_path, 'some_dir') print(path)函数比os.scandir()更为可取，但是由于os.listdir仅包含子文件夹的目录，因此产生了额外的开销通过调用该函数，因此它比原始代码要慢得多。

如果您使用的是Python版本dir_names，则>=3.5已经在后台调用了os.walk()。正如查尔斯·达菲（Charles Duffy）所述，以递归方式单独调用os.scandir()可能不会很快。

如何加快目录浏览速度？

1 个答案: