Question

我正在使用requests lib从网站下载一些图片我的代码会在下载后检查文件大小示例代码：

def download(url, store_dir):
    r = requests.get(url, headers=headers, proxies=proxies)

    filename = r.headers.get('content-disposition').split('=')[1]

    real_length = int(r.headers.get('content-length'))

    wholepath = os.path.join(store_dir, filename)

    with open(wholepath, 'wb') as f:
        f.write(r.content)
        f.close()

    if os.path.getsize(wholepath) != real_length:
        print('size error')
        print('status_code: %s' %r.status_code)
        print('headers: %s' %r.headers)
        print('url"%s' % url)
        print('orgin:', r.headers['content-length'], 'now',os.path.getsize(wholepath))
        self.download(url, store_dir)

但我经常发现，即使os.path.getsize(wholepath) == real_length，图像文件也已损坏我该如何解决这个问题？

Answer 1

将近4年。我忘了这个问题，直到今天收到一个downvote。

让我结束这个问题：

没有完美的方法来验证没有哈希字符串的文件。

但是如果您从网站抓取文件，您可以尝试查找页面/网址上是否有字符串，看起来像md5 / sha1，并尝试一下。幸运的是，它是文件哈希，您可以通过它验证文件。没有运气，没办法。

python请求检查文件是否正确下载

1 个答案:

没有完美的方法来验证没有哈希字符串的文件。