Question

我正在使用Python requests lib从网上获取PDF文件。这工作正常，但我现在也想要原始文件名。如果我在Firefox中转到PDF文件并单击download，它已经定义了一个文件名以保存pdf。我如何获得此文件名？

例如：

import requests
r = requests.get('http://www.researchgate.net/profile/M_Gotic/publication/260197848_Mater_Sci_Eng_B47_%281997%29_33/links/0c9605301e48beda0f000000.pdf')
print r.headers['content-type']  # prints 'application/pdf'

我检查了r.headers有趣的内容，但那里没有文件名。我实际上希望得到像r.filename ..

这样的东西

有人知道我如何获取带有请求库的下载PDF文件的文件名吗？

Answer 1

在http标头content-disposition中指定。所以要提取你要做的名字：

import re
d = r.headers['content-disposition']
fname = re.findall("filename=(.+)", d)

通过正则表达式（re模块）从字符串中提取的名称。

Answer 2

显然，对于这个特定的资源，它位于：

r.headers['content-disposition']

但是，不知道是否总是如此。

Answer 3

简单的python3实现，可从Content-Disposition获取文件名：

import requests
response = requests.get(<your-url>)
print(response.headers.get("Content-Disposition").split("filename=")[1])

Answer 4

以其他一些答案为基础，这就是我的做法。如果没有Content-Disposition标头，我将从下载URL进行解析。

import re
import requests
from request.exceptions import RequestException


url = 'http://www.example.com/downloads/sample.pdf'

try:
    with requests.get(url) as r:

        fname = ''
        if "Content-Disposition" in r.headers.keys():
            fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0]
        else:
            fname = url.split("/")[-1]

        print(fname)
except RequestException as e:
    print(e)

可以说有更好的解析URL字符串的方法，但是为简单起见，我不想再涉及任何库。

Answer 5

您可以将x1 = m.Var(value=20,lb=20, ub=6555) #integer=True x2 = m.Var(value=1,lb=1,ub=10000) #integer=True x3 = m.sos1([30, 42, 45, 55])用于选项标题https://werkzeug.palletsprojects.com/en/0.15.x/http/#werkzeug.http.parse_options_header

werkzeug

如何使用Python请求获取pdf文件名？

5 个答案: