如何在不实际下载的情况下获取图像尺寸?它甚至可能吗? 我有一个图像网址列表,我想为它指定宽度和大小。
我知道有一种方法可以在本地进行(How to check dimensions of all images in a directory using python?),但我不想下载所有图像。
编辑:
以下编辑。建议,我编辑了代码。我想出了this code。不确定天气会下载整个文件还是只是一部分(我想要的)。
答案 0 :(得分:17)
这是基于ed的答案以及我在网上找到的其他内容。我和.read(24)遇到了与grotos相同的问题。从here下载getimageinfo.py并从here下载ReSeekFile.py。
import urllib2
imgdata = urllib2.urlopen(href)
image_type,width,height = getimageinfo.getImageInfo(imgdata)
修改getimageinfo ......
import ReseekFile
def getImageInfo(datastream):
datastream = ReseekFile.ReseekFile(datastream)
data = str(datastream.read(30))
#Skipping to jpeg
# handle JPEGs
elif (size >= 2) and data.startswith('\377\330'):
content_type = 'image/jpeg'
datastream.seek(0)
datastream.read(2)
b = datastream.read(1)
try:
while (b and ord(b) != 0xDA):
while (ord(b) != 0xFF): b = datastream.read(1)
while (ord(b) == 0xFF): b = datastream.read(1)
if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
datastream.read(3)
h, w = struct.unpack(">HH", datastream.read(4))
break
else:
datastream.read(int(struct.unpack(">H", datastream.read(2))[0])-2)
b = datastream.read(1)
width = int(w)
height = int(h)
except struct.error:
pass
except ValueError:
pass
答案 1 :(得分:14)
我发现this site上的解决方案运行良好:
import urllib
import ImageFile
def getsizes(uri):
# get file size *and* image size (None if not known)
file = urllib.urlopen(uri)
size = file.headers.get("content-length")
if size: size = int(size)
p = ImageFile.Parser()
while 1:
data = file.read(1024)
if not data:
break
p.feed(data)
if p.image:
return size, p.image.size
break
file.close()
return size, None
print getsizes("http://www.pythonware.com/images/small-yoyo.gif")
# (10965, (179, 188))
答案 2 :(得分:10)
如果您愿意下载每个文件的前24个字节,那么this function(在johnteslade对您提到的问题的回答中提到)将会计算尺寸。
这可能是完成你想要的工作所需的最少下载。
import urllib2
start = urllib2.urlopen(image_url).read(24)
编辑(1):
在jpeg文件的情况下,它似乎需要更多的字节。您可以编辑该函数,以便不是读取StringIO.StringIO(数据)而是从urlopen读取文件句柄。然后它将读取与确定宽度和高度所需的图像完全相同的数量。
答案 3 :(得分:4)
由于上面提到的getimageinfo.py在Python3中不起作用。使用枕头代替它。
可以在pypi找到枕头,也可以使用pip pip install pillow
进行安装。
from io import BytesIO from PIL import Image import requests hrefs = ['https://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg'] RANGE = 5000 for href in hrefs: req = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)}) im = Image.open(BytesIO(req.content)) print(im.size)
答案 4 :(得分:4)
我喜欢我找到的解决方案,它会下载图像的块,直到PIL可以将其识别为图像文件,然后停止下载。这可确保下载足够的图像标头以读取尺寸,但不会更多。 (我发现了这个here和here;我已经将它改编为Python 3 +。)
import urllib
from PIL import ImageFile
def getsizes(uri):
# get file size *and* image size (None if not known)
file = urllib.request.urlopen(uri)
size = file.headers.get("content-length")
if size:
size = int(size)
p = ImageFile.Parser()
while True:
data = file.read(1024)
if not data:
break
p.feed(data)
if p.image:
return size, p.image.size
break
file.close()
return size, None
答案 5 :(得分:2)
不可能直接这样做,但有一个解决方法。如果文件存在于服务器上,则实现API端点,该端点将图像名称作为参数并返回大小。
但是如果这些文件位于不同的服务器上,那么除了下载文件外别无他法。
答案 6 :(得分:1)
不幸的是我无法评论,所以这是一个答案:
使用带标题的获取查询
"Range": "bytes=0-30"
然后只需使用
http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py
如果你使用python的“请求”,它只是
r = requests.get(image_url, headers={
"Range": "bytes=0-30"
})
image_info = get_image_info(r.content)
这修复了ed。的答案,没有任何其他依赖项(如ReSeekFile.py)。
答案 7 :(得分:1)
我的固定“getimageInfo.py”,使用Python 3.4+,尝试一下,真棒!
import io
import struct
import urllib.request as urllib2
def getImageInfo(data):
data = data
size = len(data)
#print(size)
height = -1
width = -1
content_type = ''
# handle GIFs
if (size >= 10) and data[:6] in (b'GIF87a', b'GIF89a'):
# Check to see if content_type is correct
content_type = 'image/gif'
w, h = struct.unpack(b"<HH", data[6:10])
width = int(w)
height = int(h)
# See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
# Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
# and finally the 4-byte width, height
elif ((size >= 24) and data.startswith(b'\211PNG\r\n\032\n')
and (data[12:16] == b'IHDR')):
content_type = 'image/png'
w, h = struct.unpack(b">LL", data[16:24])
width = int(w)
height = int(h)
# Maybe this is for an older PNG version.
elif (size >= 16) and data.startswith(b'\211PNG\r\n\032\n'):
# Check to see if we have the right content type
content_type = 'image/png'
w, h = struct.unpack(b">LL", data[8:16])
width = int(w)
height = int(h)
# handle JPEGs
elif (size >= 2) and data.startswith(b'\377\330'):
content_type = 'image/jpeg'
jpeg = io.BytesIO(data)
jpeg.read(2)
b = jpeg.read(1)
try:
while (b and ord(b) != 0xDA):
while (ord(b) != 0xFF): b = jpeg.read(1)
while (ord(b) == 0xFF): b = jpeg.read(1)
if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
jpeg.read(3)
h, w = struct.unpack(b">HH", jpeg.read(4))
break
else:
jpeg.read(int(struct.unpack(b">H", jpeg.read(2))[0])-2)
b = jpeg.read(1)
width = int(w)
height = int(h)
except struct.error:
pass
except ValueError:
pass
return content_type, width, height
#from PIL import Image
#import requests
#hrefs = ['http://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
#RANGE = 5000
#for href in hrefs:
#req = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
#im = getImageInfo(req.content)
#print(im)
req = urllib2.Request("http://vn-sharing.net/forum/images/smilies/onion/ngai.gif", headers={"Range": "5000"})
r = urllib2.urlopen(req)
#f = open("D:\\Pictures\\1.jpg", "rb")
print(getImageInfo(r.read()))
# Output: >> ('image/gif', 50, 50)
#print(getImageInfo(f.read()))
源代码:http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py
答案 8 :(得分:0)
import requests
from PIL import Image
from io import BytesIO
url = 'http://farm4.static.flickr.com/3488/4051378654_238ca94313.jpg'
img_data = requests.get(url).content
im = Image.open(BytesIO(img_data))
print (im.size)