Question

我正在尝试从网站下载所有pdf网址，并将所有pdf附加到单个文件中。目前，我有一个包含pdfs的所有网址列表。如何下载所有pdf并将它们一起添加？我在下面附上了我的代码。我正在使用Python 2.7.8。

# Download and merge pdfs
url_list = listofurl
for url in listofurl:
    outfile = os.path.basename(url)
    with open(outfile, 'w') as out:
        out.write(urllib2.urlopen(url).read())

Answer 1

对我来说下载有效，但是在找不到文件的一个点上会抛出异常

HTTPError: HTTP Error 404: Not Found

我不确定python本身是否能够合并文件。我建议使用“pdftk”，并在文件放在硬盘上后通过“子进程”模块调用它。

在Linux系统上，只要安装了'pdftk'（命令行的外部且非常实用的pdf合并），它就会像这样工作：

from subprocess import call

call(['pdftk', '*.pdf', 'cat', 'output', 'combined.pdf'])

这不是最狡猾的方式，但我现在最容易想到的。希望它有所帮助。

下载和附加PDF

1 个答案: