PyPDF2有一些问题-特别是在分割和重写文件方面!
我正在我的ubuntu服务器上打开一个文件,将其拆分为单独的页面(最多3个页面),并写入文件系统(然后放入S3)。写入文件时不会引发错误,但是从S3下载时无法打开它,并且正如您将在下面看到的,无法在服务器上打开。
有什么想法吗?
inputpdf = PdfFileReader(open(fi, 'rb'))
print('breaking file into %s pages' % inputpdf.numPages) # 17 pages
for i in range(min(3,inputpdf.numPages)):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
new_fi = fi[:-4]+'_page_%s.pdf' % i # fi = ./deals/temp_files/test_experian.pdf
with open(new_fi, 'wb') as outputStream:
output.write(outputStream) # successfully writes all files
pdf_check = open(new_fi, 'rb')
print('opened PDF')
read_pdf = PdfFileReader(pdf_check) # "error throw -> EOF market not found"
print('loaded PDF')
page_content = read_pdf.getPage(0).extractText()
print(page_content.encode('utf-8'))
答案 0 :(得分:0)
错误原因:
尝试以写入模式读取文件
解决方案:
for i in range(min(3,inputpdf.numPages)):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
new_fi = fi[:-4]+'_page_%s.pdf' % i
with open(new_fi, 'wb') as outputStream:
output.write(outputStream)
pdf_check = open(new_fi, 'rb')
print('opened PDF')
read_pdf = PdfFileReader(pdf_check)
print('loaded PDF')
page_content = read_pdf.getPage(0).extractText()
print(page_content.encode('utf-8'))
通过使用
with open(new_fi, 'wb') as outputStream
您以写模式创建文件指针。
默认情况下,文件仅在该“ with”块的末尾关闭。
因此,当您尝试阅读时,read_pdf会出现错误,因为在打开文件以再次读取之前未关闭文件。