这使我发疯。我正在编写一个使用PyPDF2与PDF一起使用的Python 2.7程序。逐页循环浏览PDF主文档,提取其文本以确定该页面对应的其他PDF,然后用当前选定的页面替换命名的PDF的第一页(第二个)。
尽管程序的早期步骤没有问题,但是当我尝试在读取文件后写入文件时,就会出现问题。
尽管完整程序要长得多,但这是重现该错误的最低要求:
#! python2
from pyPdf import PdfFileWriter, PdfFileReader
dir = [path to target directory]
inp = PdfFileReader(open([path to main PDF], "rb"))
temp = dir+"/"+[other target PDF filename]
outp = PdfFileWriter()
outp.addPage(inp.getPage(0)) #in the full version, this appends page "i"-- the iterator for the loop-- instead of zero
f = open(temp,"rb")
inp2 = PdfFileReader(f)
for i in range(1,inp2.numPages): outp.addPage(inp2.getPage(i))
f.close()
g = open(temp,"wb")
print g.closed #prints "False"
g.seek(0) #I added this on a hunch, but it makes no difference.
outp.write(g) #It crashes here. It doesn't crash on any "g.seek" or even "g.truncate()"
我知道它在样式上有些麻烦,但是我最初是在各种配置中使用更标准的with
块来尝试的(是的,我知道当块离开时,它们会关闭文件,并仔细观察了我的缩进)。
之后,我在r+b
模式下尝试了一个open语句(将它们分开开始的唯一原因是为了适应用户希望将输出文件保存在其他位置的罕见情况)。这将始终产生IOError: [Errno 0] Error
。
我改用直接分配单独的FileObjects
,因为它似乎更易于管理。
它当前产生的错误如下:
Traceback (most recent call last):
File "errorversion.py", line 16, in <module>
outp.write(g)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 264, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 324, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 345, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 645, in getObject
self.stream.seek(start, 0)
ValueError: I/O operation on closed file
先前的错误(它是通过单次读/写打开产生的),如下所示:
Traceback (most recent call last):
File "RemoveAndAdd.py", line 45, in <module>
main()
File "RemoveAndAdd.py", line 40, in main
output.write(f)
File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 283, in write
obj.writeToStream(stream, key)
File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 513, in writeToStream
value.writeToStream(stream, encryption_key)
File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 133, in writeToStream
data.writeToStream(stream, encryption_key)
File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 215, in writeToStream
stream.write(repr(self))
IOError: [Errno 0] Error