我正在尝试通过url下载原始图像(png格式),即时转换(不保存到光盘)并保存为jpg。
代码如下:
import os
import io
import requests
from PIL import Image
...
r = requests.get(img_url, stream=True)
if r.status_code == 200:
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
它有效,但当我尝试用r.iter_content()监视下载进度(对于未来的进度条)时,这样:
r = requests.get(img_url, stream=True)
if r.status_code == 200:
for chunk in r.iter_content():
print(len(chunk))
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
我收到此错误:
Traceback (most recent call last):
File "E:/GitHub/geoportal/quicklookScrape/temp.py", line 37, in <module>
i = Image.open(io.BytesIO(r.content))
File "C:\Python35\lib\site-packages\requests\models.py", line 736, in content
'The content for this response was already consumed')
RuntimeError: The content for this response was already consumed
那么是否有可能监控下载进度并在获取数据后呢?
答案 0 :(得分:3)
使用r.iter_content()
时,您需要在某处缓冲结果。不幸的是,我找不到任何内容被附加到内存中的对象的示例 - 通常,当文件不能或不应该一次完全加载到内存中时使用iter_content
。但是,您可以使用tempfile.SpooledTemporaryFile
缓冲它,如本答案中所述:https://stackoverflow.com/a/18550652/4527093。这将阻止将图像保存到磁盘(除非图像大于指定的max_size)。然后,您可以从Image
创建tempfile
。
import os
import io
import requests
from PIL import Image
import tempfile
buffer = tempfile.SpooledTemporaryFile(max_size=1e9)
r = requests.get(img_url, stream=True)
if r.status_code == 200:
downloaded = 0
filesize = int(r.headers['content-length'])
for chunk in r.iter_content():
downloaded += len(chunk)
buffer.write(chunk)
print(downloaded/filesize)
buffer.seek(0)
i = Image.open(io.BytesIO(buffer.read()))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
buffer.close()