PyPDF2与大文件合并问题/使用Python合并大型PDF

时间:2018-02-07 10:01:03

标签: python pdf merge large-files pypdf2


我尝试使用PyPDF2合并到python中的pdf文件。
问题是文件大小。
那么有没有其他方法来合并没有文件大小限制和内存问题的文件?

文件大小1 = 900MB
文件大小2 = 300MB

我的理解。有没有办法只加载第一个pdf的最后一页并附上第二个pdf?

我的代码:

from PyPDF2 import PdfFileMerger, PdfFileReader

merger = PdfFileMerger()

filename1 = 'document-output3.pdf'
filename2 = 'file1.pdf'

merger.append(PdfFileReader(open(filename1, 'rb')))
merger.append(PdfFileReader(open(filename2, 'rb')))

merger.write("document-output3.pdf")

- 错误讯息 -

Traceback (most recent call last):   File "C:\Users\USERNAME\eclipse-workspace\PyPDF2\MergePDF\mergepdf.py", line 13, in <module>
    merger.write("document-output5.pdf")   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\merger.py", line 230, in write
    self.output.write(fileobj)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 586, in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)   File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PyPDF2\generic.py", line 611, in readFromStream
    data["__streamdata__"] = stream.read(length) MemoryError

感谢您的关注,
费边

0 个答案:

没有答案