使用pypdf2合并PDF页面失败

时间:2018-05-20 20:09:00

标签: python pdf pypdf pypdf2

使用these demo files

  

test.pdf:"你好"
  tomerge1.pdf:" 1"
  tomerge2.pdf:" 2"

output.pdf中,我希望:

  • 第1页:test.pdf的第1页与tomerge1.pdf的第1页合并,即" Hello 1"
  • 第2页:test.pdf的第1页与tomerge2.pdf的第1页合并,即" Hello 2"

以下是我使用的内容:

from PyPDF2 import PdfFileWriter, PdfFileReader

outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page = inputpdf.getPage(0)
page.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page)

# exit()
# if we stop here, the output is "Hello 1", which is good
# Why isn't "Hello 1" remembered here?
# del page    # doesn't change anything

page = inputpdf.getPage(0)
page.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

可悲的是,它不起作用:而不是拥有" Hello 1" /" Hello 2",输出为: "你好2" /"你好2"。

问题:如何获得预期的行为?(当有10或20页时没有大小增长很快)

1 个答案:

答案 0 :(得分:1)

我发现当我做类似的练习时你需要阅读一次并合并一次。这样做的方法是为两个读者的输入文件(" test.pdf")合并设置两个读者。示例代码如下:

addressfile = open("Documents/addresses.pdf","rb")
xwfile = "Downloads/input.pdf"
crosswordfile = open(xwfile,"rb")
xword = PdfFileReader(crosswordfile)
xw2 = PdfFileReader(crosswordfile)
addr = PdfFileReader(addressfile)
xwpage = xword.getPage(0)
addpage1 = addr.getPage(1)
addpage2 = addr.getPage(2)
pdfWriter = PdfFileWriter()
xp2 = xw2.getPage(0)
xwpage.mergePage(addpage1)
xp2.mergePage(addpage2)
res = open("/home/paula/xw.pdf",'wb')
pdfWriter.addPage(xwpage)
pdfWriter.addPage(xp2)
pdfWriter.write(res)
res.close()
crosswordfile.close()

所以在你的代码中这是:

testfile = open("test.pdf", "rb")
outputpdf = PdfFileWriter()
inputpdf1 = PdfFileReader(testfile)
inputpdf2 = PdfFileReader(testfile)
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page1 = inputpdf1.getPage(0)
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)

# exit()
# No need stop here, the output will have both "Hello 1" and "Hello 2"
# Using two readers for the same file fools PyPdf2 into thinking they 
# are two different files, i.e. that we are merging from two sperate sources

page2 = inputpdf2.getPage(0)
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)