Question

文件夹中有10,000个文件。在2018-06-01创建的文件很少，2018-06-09很少有这样的文件。我需要找到2018-06-09创建的所有文件。但是要花费很多时间（差不多2个小时）来读取每个文件并获取文件创建日期，然后获取在2018-06-09创建的文件。

 for file in os.scandir(Path):

   if file.is_file():


        file_ctime =datetime.fromtimestamp(os.path.getctime(file)).strftime('%Y- %m- %d %H:%M:%S')
           if file_ctime[0:4] == '2018-06-09'
             .....

Answer 1

您可以尝试使用os.listdir(path)从指定路径获取所有文件和目录。

获得所有文件和目录后，可以使用filter和lambda函数创建仅包含所需时间戳的文件的新列表。

然后，您可以遍历该列表，以便在正确的文件上执行您需要的工作。

Answer 2

让我们从最基本的东西开始 - 为什么要构建datetime只是将其重新格式化为字符串然后进行字符串比较？

然后使用os.scandir()而不是os.listdir() - os.scandir()返回os.DirEntry，通过os.DirEntry.stat()调用缓存文件统计信息。

根据您需要执行的检查，如果您希望对文件名进行大量过滤，os.listdir()甚至可能表现更好，因为您不需要构建整个os.DirEntry只是为了丢弃它。

因此，要优化循环，如果您不希望对名称进行大量过滤：

for entry in os.scandir(Path):
    if entry.is_file() and 1528495200 <= entry.stat().st_ctime < 1528581600:
        pass  # do whatever you need with it

如果你这样做，那么最好坚持使用os.listdir()：

import stat

for entry in os.listdir(Path):
    # do your filtering on the entry name first...
    path = os.path.join(Path, entry)  # build path to the listed entry...
    stats = os.stat(path)  # cache the file entry statistics
    if stat.S_ISREG(stats.st_mode) and 1528495200 <= stats.st_ctime < 1528581600:
        pass  # do whatever you need with it

如果您希望灵活处理时间戳，请事先使用datetime.datetime.timestamp()来获取POSIX时间戳，然后您可以将它们与stat_result.st_ctime直接返回的内容进行比较而不进行转换。

然而，即使是原始的，非优化的方法，对于仅仅10k的条目，也应该明显快于2小时。我也检查了底层文件系统，那里似乎有些错误。

Python scandir方法所需的建议

2 个答案: