Question

我正在寻找从URL下载文件，保存到磁盘以及从URL或标题中找出文件名的正确方法。

解决方案可以是Python，Node，Ruby或PHP - 只要其中一个选项对我来说无关紧要。

通过猜测URL中的文件名很容易做一个天真的实现，但是即使有重定向并且文件名不在URL中，我也需要这个。

以下是一些示例网址和我期望的文件名：

网址示例中的文件名

网址：http://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2010/4/14/1271276213693/Snoop-Dogg-in-2004-001.jpg
下载应使用文件名保存：Snoop-Dogg-in-2004-001.jpg

网址示例中的文件名+查询参数

网址：http://i.imgur.com/mW7vW4j.gif?go=true
下载应使用文件名保存：mW7vW4j.gif

重定向 - 标头示例中的文件名

网址：https://api.soundcloud.com/tracks/183721111/download?client_id=b45b1aa10f1ac2941910a7f0d10f8e28
下载应保存文件名：I Might ft.P-Lo＆amp; K Camp.mp3

而且 - 这里有关于重定向案例的更多信息：Ruby - how to download a file if the url is a redirection?

Answer 1

Ruby，使用Mechanize gem，简单案例：

require 'mechanize'
agent = Mechanize.new
agent.get(url).save

这甚至会遵循重定向并使用正确的文件名保存。它将第二个示例中的http查询字符串转换为有效的文件名。如果要删除任何查询字符串（警告：这可能是识别唯一资源所必需的），您可能需要调整它，如下所示：

require 'mechanize'
agent = Mechanize.new    
uri = URI.parse(url)    
if uri.query.nil?
  agent.get(url).save
else
  agent.get(url).save_as(File.basename(uri.path))
end

Answer 2

使用Python requests模块。

import requests, os

url = "http://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2010/4/14/1271276213693/Snoop-Dogg-in-2004-001.jpg"
resp = requests.get(url, stream=True, allow_redirects=True)
realurl = resp.url.split('/')[-1].split('?')[0]

savepath = '' # set the folder to save to
filepath = os.path.join(savepath, realurl)

with open(filepath, 'wb') as image:
    if resp.ok:
        for content in resp.iter_content(1024):
            if content:
                image.write(content)

如何从URL下载文件到磁盘并猜测文件名

2 个答案: