将PDF文件与Python3合并

时间:2013-02-22 10:24:41

标签: python pdf python-3.x

我正在编写一个需要合并许多单页pdf文件的小脚本。我希望脚本能够与Python3一起运行,并尽可能减少依赖关系。

对于PDF合并部分,我尝试使用PyPdf。但是,Python 3的支持似乎是错误的;它无法处理inkscape生成的PDF文件(我需要)。我安装了当前的Git版本的PyPdf,并且以下测试脚本不起作用:

import PyPDF2

output_pdf = PyPDF2.PdfFileWriter()

with open("testI.pdf", "rb") as input:
    input_pdf = PyPDF2.PdfFileReader(input)
    output_pdf.addPage(input_pdf.getPage(0))

with open("test.pdf", "wb") as output:
    output_pdf.write(output)

它抛出以下堆栈跟踪:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    output.addPage(input.getPage(0))
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
    self._flatten()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
    self._flatten(page.getObject(), inherit)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
    return self.pdf.getObject(self).getObject()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
    retval = readObject(self.stream, self)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
    value = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
    obj = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
    return NumberObject.readFromStream(stream)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
    return FloatObject(name.decode("ascii"))
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
    return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context

然而,相同的脚本与Python 2.7完美配合。

我在这里做错了什么?这是库中的错误吗?我可以在不触及PyPDF库的情况下解决它吗?

2 个答案:

答案 0 :(得分:3)

所以我找到了答案。 Python3.3中的decimal.Decimal模块显示了一些奇怪的行为。这是相应的StackOverflow问题:Instantiate Decimal class我向PyPDF2库添加了一些解决方法并提交了拉取请求。

答案 1 :(得分:2)

只是为了确保您了解已经存在的工具:

  • PDFtk
  • PDFjam(我最喜欢的,需要LaTeX)
  • 直接与GhostScript
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf