Python3从URL

时间:2017-05-27 22:46:46

标签: python python-3.x

我目前遇到的问题是尝试下载显示为动画gif的图片,但显示为jpg。我说它似乎被编码为jpg,因为文件扩展名和mime-type都是.jpg add image / jpeg。

将文件下载到本地计算机(Mac OSX),然后尝试打开文件时出现错误:

The file could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

虽然我意识到有些人可能会忽略该图像,如果可以修复,我正在寻找解决方案,而不是忽略它。

有问题的网址在这里:

http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg

这是我的代码,我愿意接受建议:

from PIL import Image
import requests

response = requests.get(media, stream = True)
response.raise_for_status()

with open(uploadedFile, 'wb') as img:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            img.write(chunk) 
    img.close()

2 个答案:

答案 0 :(得分:1)

根据Wheregoes,图片的链接:

  • http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg

收到302重定向到包含它的页面:

  • http://www.supergrove.com/gif-images/gif-images-22-1000-about-gif-on-pinterest/

因此,您的代码正在尝试将网页下载为图片。

tried

r = requests.get(the_url, headers=headers, allow_redirects=False)

但它返回零内容和status_code = 302

(确实很明显它应该发生......)

此服务器的配置方式使其永远不会满足该请求。

绕过那个限制听起来很难理解非法,这是我最有限的知识。

答案 1 :(得分:1)

在这种情况下不得不回答我自己的问题,但是这个问题的答案是为请求添加referer。很可能是htaccess文件阻止了图像服务器上的某些直接文件访问,除非请求来自他们自己的服务器。

from fake_useragent import UserAgent
from io import StringIO,BytesIO
import io
import imghdr
import requests

# Set url
mediaURL = 'http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg'

# Create a user agent
ua = UserAgent()

# Create a request session
s = requests.Session()

# Set some headers for the request
s.headers.update({ 'User-Agent': ua.chrome, 'Referrer': media })


# Make the request to get the image from the url
response = s.get(mediaURL, allow_redirects=False)


# The request was about to be redirected
if response.status_code == 302:

    # Get the next location that we would have been redirected to
    location = response.headers['Location']

    # Set the previous page url as referer
    s.headers.update({'referer': location})

    # Try the request again, this time with a referer
    response = s.get(mediaURL, allow_redirects=False, cookies=response.cookies)

    print(response.headers)

提示使用allow_redirects提示@raratiru

在他们的回答中还指出,图像的服务器可能会故意阻止访问,以防止一般刮刀查看他们的图像。很难说,但无论如何,这个解决方案都有效。