Question

我正在使用Python脚本从imgur.com下载大量图片，因为我拥有http://imgur.com/{id}格式的所有链接，我必须通过将原始网址替换为{{1}来强制下载它们}，然后保存所有图像没有扩展名。（我知道有一个Imgur的API，但我不能使用它，因为它对这种工作有限制）

现在，在下载图像之后，我想使用imghdr模块来确定图像的原始扩展名：

http://i.imgur.com/{id}.gif

问题在于，这种方法的成功率为80％，剩下的20％都标识为>>> import imghdr >>> imghdr.what('/images/GrEdc') 'gif'，并检查其中一些我发现它们很可能都是.jpg图像。

为什么imghdr无法检测格式？即使没有扩展名，我也可以使用Ubuntu的默认图像查看器打开这些图像，所以我认为它们没有被破坏。

Answer 1

请注意，在2019年，此错误尚未得到修复。 Paul R的链接上提供了该解决方案。

一种解决问题的方法是对问题进行修补：

# Monkeypatch bug in imagehdr
from imghdr import tests

def test_jpeg1(h, f):
    """JPEG data in JFIF format"""
    if b'JFIF' in h[:23]:
        return 'jpeg'


JPEG_MARK = b'\xff\xd8\xff\xdb\x00C\x00\x08\x06\x06' \
            b'\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f'

def test_jpeg2(h, f):
    """JPEG with small header"""
    if len(h) >= 32 and 67 == h[5] and h[:32] == JPEG_MARK:
        return 'jpeg'


def test_jpeg3(h, f):
    """JPEG data in JFIF or Exif format"""
    if h[6:10] in (b'JFIF', b'Exif') or h[:2] == b'\xff\xd8':
        return 'jpeg'

tests.append(test_jpeg1)
tests.append(test_jpeg2)
tests.append(test_jpeg3)

Answer 2

这是lib中的一个已知问题，它无法检测出一些有效的 JPEG 图像。

您可以使用lib的修改来更好地检测所有JPEG图像，特别是在您知道所有文件都是图像的情况下。

https://bugs.python.org/issue28591

如果即使使用这个修复的lib你也无法检测到某些图像，那么你可以尝试使用pillow来支持更多的格式，但是它不那么轻量级，并且是python内置中不包含的外部依赖项库。

Answer 3

在通过MIMEImage类创建邮件附件时出现了问题，并且显示了错误（作为googlefood）：

  File "/usr/lib/python2.7/email/mime/image.py", line 43, in __init__
    raise TypeError('Could not guess image MIME subtype')
TypeError: Could not guess image MIME subtype

原因是MIMEImage内部依赖于（buggy）imghdr.what。

    if _subtype is None:
        _subtype = imghdr.what(None, _imagedata)
    if _subtype is None:
        raise TypeError('Could not guess image MIME subtype')

我可以通过使用guess_type来解决此问题：

from email.mime.image import MIMEImage
from mimetypes import guess_type
(mimetype, encoding) = guess_type(image)
(maintype, subtype) = mimetype.split('/');
fp = open(os.path.join(dirpath, image), 'rb')
mimeimage = MIMEImage(fp.read(), **{'_subtype': subtype})

imghdr / python - 无法检测某些图像的类型（图像扩展名）

3 个答案: