获取图像大小而无需在Python中下载

时间:2011-09-18 07:58:58

标签: python image url

如何在不实际下载的情况下获取图像尺寸?它甚至可能吗? 我有一个图像网址列表,我想为它指定宽度和大小。

我知道有一种方法可以在本地进行(How to check dimensions of all images in a directory using python?),但我不想下载所有图像。

编辑:

以下编辑。建议,我编辑了代码。我想出了this code。不确定天气会下载整个文件还是只是一部分(我想要的)。

9 个答案:

答案 0 :(得分:17)

这是基于ed的答案以及我在网上找到的其他内容。我和.read(24)遇到了与grotos相同的问题。从here下载getimageinfo.py并从here下载ReSeekFile.py。

import urllib2
imgdata = urllib2.urlopen(href)
image_type,width,height = getimageinfo.getImageInfo(imgdata)

修改getimageinfo ......

import ReseekFile

def getImageInfo(datastream):
    datastream = ReseekFile.ReseekFile(datastream)
    data = str(datastream.read(30))

#Skipping to jpeg

# handle JPEGs
elif (size >= 2) and data.startswith('\377\330'):
    content_type = 'image/jpeg'
    datastream.seek(0)
    datastream.read(2)
    b = datastream.read(1)
    try:
        while (b and ord(b) != 0xDA):
            while (ord(b) != 0xFF): b = datastream.read(1)
            while (ord(b) == 0xFF): b = datastream.read(1)
            if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                datastream.read(3)
                h, w = struct.unpack(">HH", datastream.read(4))
                break
            else:
                datastream.read(int(struct.unpack(">H", datastream.read(2))[0])-2)
            b = datastream.read(1)
        width = int(w)
        height = int(h)
    except struct.error:
        pass
    except ValueError:
        pass

答案 1 :(得分:14)

我发现this site上的解决方案运行良好:

import urllib
import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    if size: size = int(size)
    p = ImageFile.Parser()
    while 1:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None

print getsizes("http://www.pythonware.com/images/small-yoyo.gif")
# (10965, (179, 188))

答案 2 :(得分:10)

如果您愿意下载每个文件的前24个字节,那么this function(在johnteslade对您提到的问题的回答中提到)将会计算尺寸。

这可能是完成你想要的工作所需的最少下载。

import urllib2
start = urllib2.urlopen(image_url).read(24)

编辑(1):

在jpeg文件的情况下,它似乎需要更多的字节。您可以编辑该函数,以便不是读取StringIO.StringIO(数据)而是从urlopen读取文件句柄。然后它将读取与确定宽度和高度所需的图像完全相同的数量。

答案 3 :(得分:4)

由于上面提到的getimageinfo.py在Python3中不起作用。使用枕头代替它。

可以在pypi找到枕头,也可以使用pip pip install pillow进行安装。


from io import BytesIO
from PIL import Image
import requests
hrefs = ['https://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
RANGE = 5000
for href in hrefs:
    req  = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
    im = Image.open(BytesIO(req.content))

    print(im.size)

答案 4 :(得分:4)

我喜欢我找到的解决方案,它会下载图像的块,直到PIL可以将其识别为图像文件,然后停止下载。这可确保下载足够的图像标头以读取尺寸,但不会更多。 (我发现了这个herehere;我已经将它改编为Python 3 +。)

import urllib
from PIL import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.request.urlopen(uri)
    size = file.headers.get("content-length")
    if size: 
        size = int(size)
    p = ImageFile.Parser()
    while True:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None   

答案 5 :(得分:2)

不可能直接这样做,但有一个解决方法。如果文件存在于服务器上,则实现API端点,该端点将图像名称作为参数并返回大小。

但是如果这些文件位于不同的服务器上,那么除了下载文件外别无他法。

答案 6 :(得分:1)

不幸的是我无法评论,所以这是一个答案:

使用带标题的获取查询

"Range": "bytes=0-30"

然后只需使用

http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

如果你使用python的“请求”,它只是

r = requests.get(image_url, headers={
    "Range": "bytes=0-30"
})
image_info = get_image_info(r.content)

这修复了ed。的答案,没有任何其他依赖项(如ReSeekFile.py)。

答案 7 :(得分:1)

我的固定“getimageInfo.py”,使用Python 3.4+,尝试一下,真棒!

import io
import struct
import urllib.request as urllib2

def getImageInfo(data):
    data = data
    size = len(data)
    #print(size)
    height = -1
    width = -1
    content_type = ''

    # handle GIFs
    if (size >= 10) and data[:6] in (b'GIF87a', b'GIF89a'):
        # Check to see if content_type is correct
        content_type = 'image/gif'
        w, h = struct.unpack(b"<HH", data[6:10])
        width = int(w)
        height = int(h)

    # See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
    # Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
    # and finally the 4-byte width, height
    elif ((size >= 24) and data.startswith(b'\211PNG\r\n\032\n')
          and (data[12:16] == b'IHDR')):
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[16:24])
        width = int(w)
        height = int(h)

    # Maybe this is for an older PNG version.
    elif (size >= 16) and data.startswith(b'\211PNG\r\n\032\n'):
        # Check to see if we have the right content type
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[8:16])
        width = int(w)
        height = int(h)

    # handle JPEGs
    elif (size >= 2) and data.startswith(b'\377\330'):
        content_type = 'image/jpeg'
        jpeg = io.BytesIO(data)
        jpeg.read(2)
        b = jpeg.read(1)
        try:
            while (b and ord(b) != 0xDA):
                while (ord(b) != 0xFF): b = jpeg.read(1)
                while (ord(b) == 0xFF): b = jpeg.read(1)
                if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                    jpeg.read(3)
                    h, w = struct.unpack(b">HH", jpeg.read(4))
                    break
                else:
                    jpeg.read(int(struct.unpack(b">H", jpeg.read(2))[0])-2)
                b = jpeg.read(1)
            width = int(w)
            height = int(h)
        except struct.error:
            pass
        except ValueError:
            pass

    return content_type, width, height



#from PIL import Image
#import requests
#hrefs = ['http://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
#RANGE = 5000
#for href in hrefs:
    #req  = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
    #im = getImageInfo(req.content)

    #print(im)
req = urllib2.Request("http://vn-sharing.net/forum/images/smilies/onion/ngai.gif", headers={"Range": "5000"})
r = urllib2.urlopen(req)
#f = open("D:\\Pictures\\1.jpg", "rb")
print(getImageInfo(r.read()))
# Output: >> ('image/gif', 50, 50)
#print(getImageInfo(f.read()))

源代码:http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

答案 8 :(得分:0)

import requests
from PIL import Image
from io import BytesIO

url = 'http://farm4.static.flickr.com/3488/4051378654_238ca94313.jpg'

img_data = requests.get(url).content    
im = Image.open(BytesIO(img_data))
print (im.size)