Question

我有这个代码，可以在目录中找到最新的zip文件。该程序运行速度非常快，只有很少的文件夹，但有很多文件夹，比如我需要查看的789个文件夹，其中包含zip文件，代码需要30多分钟才能生成输出。关于如何使这段代码运行得更快的任何提示？

import os, glob

cwd = os.getcwd()

list_of_latest = []
for (dirname, dirs, files) in os.walk(cwd):
    for filename in files:
        if filename.endswith('.zip'):
            list_of_files = glob.glob(dirname + '\*.zip')
            latest_file = max(list_of_files, key=os.path.getctime) 
            if latest_file not in list_of_latest:
                list_of_latest.append(latest_file)

for i in list_of_latest:
    print i

提前致谢！

Answer 1

您可能没有意识到，但代码中存在冗余循环。这段代码在这里：

for filename in files:
    if filename.endswith('.zip'):
        list_of_files = glob.glob(dirname + '\*.zip')

glob.glob将检索当前目录中的所有 zip文件（由dirname指定，这是根路径。现在，如果您有10 zip在该目录中的文件中，您将运行glob.glob 10次！每次都会找到相同的文件。但它只会附加到列表中。

整个内循环可以简化为：

for (dirname, dirs, files) in os.walk(cwd):
    list_of_files = glob.glob(dirname + '\*.zip')
    if len(list_of_files) == 0: 
        continue
    latest_file = max(list_of_files, key=os.path.getctime) 

    if latest_file not in list_of_latest:
        list_of_latest.append(latest_file)

这个内循环是不必要的。

Answer 2

您正在迭代目录中的所有文件两次 - 一次使用：

for filename in files:

然后：

latest_file = max(list_of_files, key=os.path.getctime)

你可能想要的是：

for (dirname, dirs, files) in os.walk(cwd):
    list_of_files = glob.glob(dirname + '\*.zip')
    latest_file = max(list_of_files, key=os.path.getctime) 
    if latest_file not in list_of_latest:
        list_of_latest.append(latest_file)

哦，如果您使用集合代替list_of_latest的列表，则可以进一步简化：

list_of_latest = set()
for (dirname, dirs, files) in os.walk(cwd):
    list_of_files = glob.glob(dirname + '\*.zip')
    latest_file = max(list_of_files, key=os.path.getctime) 
    list_of_latest.add(latest_file)

在目录中获取最新文件需要很长时间才能运行

2 个答案: