Question

我目前遇到的问题是尝试下载显示为动画gif的图片，但显示为jpg。我说它似乎被编码为jpg，因为文件扩展名和mime-type都是.jpg add image / jpeg。

将文件下载到本地计算机（Mac OSX），然后尝试打开文件时出现错误：

The file could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

虽然我意识到有些人可能会忽略该图像，如果可以修复，我正在寻找解决方案，而不是忽略它。

有问题的网址在这里：

http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg

这是我的代码，我愿意接受建议：

from PIL import Image
import requests

response = requests.get(media, stream = True)
response.raise_for_status()

with open(uploadedFile, 'wb') as img:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            img.write(chunk) 
    img.close()

Answer 1

根据Wheregoes，图片的链接：

http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg

收到302重定向到包含它的页面：

http://www.supergrove.com/gif-images/gif-images-22-1000-about-gif-on-pinterest/

因此，您的代码正在尝试将网页下载为图片。

我tried：

r = requests.get(the_url, headers=headers, allow_redirects=False)

但它返回零内容和status_code = 302。

（确实很明显它应该发生......）

此服务器的配置方式使其永远不会满足该请求。

绕过那个限制听起来很难理解非法，这是我最有限的知识。

Answer 2

在这种情况下不得不回答我自己的问题，但是这个问题的答案是为请求添加referer。很可能是htaccess文件阻止了图像服务器上的某些直接文件访问，除非请求来自他们自己的服务器。

from fake_useragent import UserAgent
from io import StringIO,BytesIO
import io
import imghdr
import requests

# Set url
mediaURL = 'http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg'

# Create a user agent
ua = UserAgent()

# Create a request session
s = requests.Session()

# Set some headers for the request
s.headers.update({ 'User-Agent': ua.chrome, 'Referrer': media })


# Make the request to get the image from the url
response = s.get(mediaURL, allow_redirects=False)


# The request was about to be redirected
if response.status_code == 302:

    # Get the next location that we would have been redirected to
    location = response.headers['Location']

    # Set the previous page url as referer
    s.headers.update({'referer': location})

    # Try the request again, this time with a referer
    response = s.get(mediaURL, allow_redirects=False, cookies=response.cookies)

    print(response.headers)

提示使用allow_redirects提示@raratiru。

在他们的回答中还指出，图像的服务器可能会故意阻止访问，以防止一般刮刀查看他们的图像。很难说，但无论如何，这个解决方案都有效。

Python3从URL

2 个答案: