Python - 将字节缓冲区转换为文件大小

时间:2017-06-25 18:19:03

标签: python python-3.x size byte tqdm

我正在编写一个计算文件列表校验和的程序,然后将其与参考文件进行比较。

我正在尝试将hashfile方法中的字节缓冲区转换为与os.stat(path).st_size使用的单位相同的文件大小,以便我可以相应地更新tqdm进度条。 (试图实现最后一个例子here

我尝试了很多事情(len(buf):给我的处理大小远远大于总数int.from_bytes():OverflowError - int太大而无法转换为float,struct.unpack_from(buf) :需要一次读取一个字节,转换字节的各种函数)但到目前为止没有任何工作。似乎我不太了解字节,不知道要搜索什么或实现我找到的解决方案。

以下是代码的摘录:

import hashlib
import os
from tqdm import tqdm

# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size

def hashfile(self, progress, afile, hasher, blocksize=65536):
    """
    Checksum buffer
    :param progress: progress bar object
    :param afile: file to process
    :param hasher: checksum algorithm
    :param blocksize: size of the buffer
    :return: hash digest
    """
    buf = afile.read(blocksize)

    while len(buf) > 0:
        self.processed_size += buf  # need to convert from bytes to file size
        hasher.update(buf)
        progress.update(self.processed_size)  # tqdm update
        buf = afile.read(blocksize)

    afile.seek(0)
    return hasher.digest()

def process_file(self, progress, fichier):
    """
    Checks if the file is in the reference dictionary;
    If so, checks if the size of the file matches the one stored in the dictionary;
    If so, calculates the checksum of the file and compares it to the one in the dictionary
    :param progress: progress bar object
    :param fichier: asset file to process
    :return: string outcome of the process
    """
    checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
    # check if checksum matches
    return outcome

def main_process(self):
    """
    Launches and monitors the process and writes a report of the results
    :return: application end
    """
    with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
        all_results = []

        for f in self.assets.keys():
            results = self.process_file(pbar, f)
            all_results.append(results)

    for r in all_results:
        print(r)

1 个答案:

答案 0 :(得分:0)

感谢@RadosławCybulski找到解决方案,我现在明白tqdm.update()函数是如何工作的:它没有将进度状态设置为参数,而是添加它。我像这样更新了hashfile方法:

    while len(buf) > 0:
        hasher.update(buf)
        progress.update(len(buf))
        buf = afile.read(blocksize)