Question

我正在使用for循环遍历目录树中的大型组文件。

这样做时，我想通过控制台中的进度条监视进度。所以，我决定将tqdm用于此目的。

目前，我的代码如下所示：

for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
        sleep(0.01)
        dirName = dirPath.split(os.path.sep)[-1]
        for fname in fileList:
        *****

输出：

Scanning Directory....
43it [00:23, 11.24 it/s]

所以，我的问题是它没有显示进度条。我想知道如何正确使用它并更好地理解它的工作原理。此外，如果有任何其他替代tqdm可以在这里使用。

Answer 1

除非您知道“完整”的含义，否则您无法显示百分比完整。

当os.walk正在运行时，它不知道它将最终迭代的文件和文件夹的数量：os.walk的返回类型没有__len__。它必须一直向下看目录树，枚举所有文件和文件夹，以便对它们进行计数。换句话说，os.walk必须完成它的所有工作两次才能告诉你它将生成多少项，这是低效的。

如果您设置了进度条，则可以将数据假脱机到内存列表中：list(os.walk(target_dir))。我不推荐这个。如果您遍历大型目录树，则会占用大量内存。更糟糕的是，如果followlinks是True并且你有一个循环目录结构（子节点链接到它们的父节点），那么它可能会永远循环，直到你用完RAM。

Answer 2

这是因为tqdm不知道os.walk的结果会有多长，因为它是一个生成器，所以len可以＆＃39}。被叫它。您可以先将os.walk(target_dir)转换为列表来解决此问题：

for dirPath, subdirList, fileList in tqdm(list(os.walk(target_dir))):

来自tdqm模块的文档：

如果可能，使用
len（iterable）。作为最后的手段，只有基本的显示进度统计信息（无ETA，无进度条）。

但是，len(os.walk(target_dir))是不可能的，因此没有ETA或进度条。

正如本杰明指出的那样，使用list确实会使用一些记忆，但不会太多。一个约190,000个文件的假脱机目录导致Python在我的Windows 10机器上使用大约65MB的内存。

Answer 3

作为explained in the documentation，这是因为您需要提供进度指示器。根据您对文件的处理方式，您可以使用文件数或文件大小。

其他答案建议将os.walk()生成器转换为列表，以便获得__len__属性。但是，这会花费大量内存，具体取决于您拥有的文件总数。

另一种可能性是预先计算：你首先走遍整个文件树并计算文件总数（但不保留文件列表，只计算数量！），然后你就可以走了再次向tqdm提供您预先计算的文件数：

def walkdir(folder):
    """Walk through every files in a directory"""
    for dirpath, dirs, files in os.walk(folder):
        for filename in files:
            yield os.path.abspath(os.path.join(dirpath, filename))

# Precomputing files count
filescount = 0
for _ in tqdm(walkdir(target_dir)):
    filescount += 1

# Computing for real
for filepath in tqdm(walkdir(target_dir), total=filescount):
        sleep(0.01)
        # etc...

请注意，我在os.walkdir上定义了一个包装器函数：因为您正在处理文件而不是目录，所以最好定义一个将在文件而不是目录上进行的函数。

但是，您可以在不使用walkdir包装器的情况下获得相同的结果，但它会更复杂，因为您必须在每个遍历的子文件夹之后恢复上一个进度条状态：

# Precomputing
filescount = 0
for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
    filescount += len(filesList)

# Computing for real
last_state = 0
for dirPath, subdirList, fileList in os.walk(target_dir):
    sleep(0.01)
    dirName = dirPath.split(os.path.sep)[-1]
    for fname in tqdm(fileList, total=filescount, initial=last_state):
        # do whatever you want here...
    # Update last state to resume the progress bar
    last_state += len(fileList)

Answer 4

这是我解决类似问题的方法：

    for root, dirs, files in os.walk(local_path):
        path, dirs, files = os.walk(local_path).next()
        count_files = (int(len(files)))
        for i in tqdm.tqdm(range(count_files)):
            time.sleep(0.1)
            for fname in files:
                full_fname = os.path.join(root, fname)

Answer 5

这是一种更简洁的方法，可以预先计算文件数量，然后在文件上提供状态栏：

file_count = sum(len(files) for _, _, files in os.walk(folder))  # Get the number of files
with tqdm(total=file_count) as pbar:  # Do tqdm this way
    for root, dirs, files in os.walk(folder):  # Walk the directory
        for name in files:
            pbar.update(1)  # Increment the progress bar
            # Process the file in the walk

Answer 6

您可以通过这种方式使用 tqdm 处理目录路径中的所有文件。

from tqdm import tqdm
target_dir = os.path.join(os.getcwd(), "..Your path name")#it has 212 files
for r, d, f in os.walk(target_dir):
    for file in tqdm(f, total=len(f)):
        filepath = os.path.join(r, file)
        #f'Your operation on file..{filepath}'

20%|████████████████████ | 42/212 [05:07<17:58, 6.35s/it]

这样你就会进步...

在函数内部的for循环上使用tqdm来检查进度

6 个答案: