我想采用多页pdf文件,并在每页创建单独的pdf文件。
我已下载reportlab并浏览了文档,但似乎是针对pdf生成。我还没有看到任何关于自己处理PDF文件的事情。
在python中有一种简单的方法吗?
答案 0 :(得分:110)
from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open("document.pdf", "rb"))
for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
等
答案 1 :(得分:3)
PyPDF2软件包使您能够将一个PDF拆分为多个PDF。
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
pdf_writer = PdfFileWriter()
pdf_writer.addPage(pdf.getPage(page))
output_filename = '{}_page_{}.pdf'.format(fname, page+1)
with open(output_filename, 'wb') as out:
pdf_writer.write(out)
print('Created: {}'.format(output_filename))
来源:https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/
答案 2 :(得分:2)
我在这里错过了一个解决方案,您将PDF分为由所有页面组成的两部分,因此如果有人在寻找相同的内容,我会附加我的解决方案:
from PyPDF2 import PdfFileWriter, PdfFileReader
def split_pdf_to_two(filename,page_number):
pdf_reader = PdfFileReader(open(filename, "rb"))
try:
assert page_number < pdf_reader.numPages
pdf_writer1 = PdfFileWriter()
pdf_writer2 = PdfFileWriter()
for page in range(page_number):
pdf_writer1.addPage(pdf_reader.getPage(page))
for page in range(page_number,pdf_reader.getNumPages()):
pdf_writer2.addPage(pdf_reader.getPage(page))
with open("part1.pdf", 'wb') as file1:
pdf_writer1.write(file1)
with open("part2.pdf", 'wb') as file2:
pdf_writer2.write(file2)
except AssertionError as e:
print("Error: The PDF you are cutting has less pages than you want to cut!")
答案 3 :(得分:1)
我知道该代码与python不相关,但是我想发布这段简单,灵活且效果惊人的R代码。 R中的PDFtools软件包在轻松拆分合并的PDF方面非常了不起。
library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
pages = 1:51, output = "subset.pdf")