Question

我正在研究一种用Python编写的服务，该服务一度从给定的URL下载图像并将它们存储在服务器上。

此服务检查从URL返回的内容类型，只有在内容类型为“image / jpeg”等时才下载图像。

我最近遇到了以下网址的一个有趣问题： http://www.nationaldentalreviews.org/Handlers/ImageDisplay.ashx?qUID=8597&qType=__ProfileMicroSite

此URL在浏览器中打开时会显示某种编码字符串。

当用作图像标记的'src'时，它会渲染图像。

{{1}}

此网址的内容类型为text / html

在Python中，有没有办法让我发现这个URL指向一个可以用作'src'的图像？

Answer 1

检索图像数据并使用https://docs.python.org/2/library/imghdr.html。

Answer 2

您看到的编码字符串是jpeg的二进制内容。服务器正在将内容类型标题错误地设置为text / html，因此您的浏览器会尝试将其显示为html而不是jpeg。

您可以下载该文件并使用python图像库尝试打开图像，如果PIL不是图像，PIL会抛出异常。

>>> from PIL import Image
>>> im = Image.open("foo.jpg")
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=229x103 at 0x21A3300>
>>> im = Image.open("html.jpg")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\site-packages\PIL\Image.py", line 1980, in open
    raise IOError("cannot identify image file")
IOError: cannot identify image file
>>>

如何验证此URL是否指向图像？

2 个答案: