我尝试合并从Google云端硬盘下载的PDF,但出现此错误:
ValueError: invalid literal for int() with base 10: b'F-1.4'
当我合并用Keynote生成的PDF时不会发生这种情况。
完整错误如下:
Traceback (most recent call last):
File "weekly_meeting.py", line 36, in <module>
file_path = sort_pdf(path)
File "weekly_meeting.py", line 15, in sort_pdf
pdf_merger.append(file)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/merger.py", line 151, in merge
outline = pdfr.getOutlines()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1346, in getOutlines
lines = catalog["/Outlines"]
File "/usr/local/lib/python3.6/site-packages/PyPDF2/generic.py", line 516, in __getitem__
return dict.__getitem__(self, key).getObject()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/generic.py", line 178, in getObject
return self.pdf.getObject(self).getObject()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1599, in getObject
idnum, generation = self.readObjectHeader(self.stream)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: b'F-1.4'
我尝试过
这是我的代码,问题似乎出在pdf_merger.append(file):
def sort_pdf(path):
pdf_merger = PdfFileMerger()
if (os.path.isdir(path)):
head, file_name = os.path.split(path)
os.chdir(path)
chronology = ["OVERVIEW", "CUSTOMER", "PROJECT", "PERSONAL"]
for prefix in chronology:
for file in glob.glob(prefix + "*.pdf"):
pdf_merger.append(file)
file_path = path + "/" + file_name + ".pdf"
with open(file_path, 'wb') as result:
pdf_merger.write(result)
return file_path
我希望输出是经过排序和组合的PDF,这已经与其他文档一起实现了。
答案 0 :(得分:0)
好像您输入的PDF损坏了。此b'F-1.4'应该读为b'%PDF-1.4'– stovfl
使用PdfFileReader和PdfFileWriter代替PdfFilerMerge,并通过以下代码为我解决了这个问题:
for file in glob.glob(prefix + "*.pdf"):
pdf_reader = PdfFileReader(file)
pdf_reader._header = b_("%PDF-1.4")
for page in range(pdf_reader.getNumPages()):
pdf_writer.addPage(pdf_reader.getPage(page))
只需完全覆盖标头即可。
答案 1 :(得分:0)
这对我有用。它基于this,我刚刚完成了带有import语句和固定缩进问题的代码。
import PyPDF2
pdfs = ['1.pdf', '2.pdf', '3.pdf']
pdfWriter = PyPDF2.PdfFileWriter()
# loop through all PDFs
for filename in pdfs:
# rb for read binary
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# Opening each page of the PDF
for pageNum in range(pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
# save PDF to file, wb for write binary
pdfOutput = open('output.pdf', 'wb')
# Outputting the PDF
pdfWriter.write(pdfOutput)
# Closing the PDF writer
pdfOutput.close()