我正在编码一些东西,这些东西将在线阅读PDF并返回在文档中找到的一组关键字。但是,我仍然遇到PyPDF2软件包中的extractText()
函数的问题。
这是我打开PDF并阅读的代码:
x = myurl.pdf
if ".pdf" in x:
remoteFile = urlopen(Request(x, headers={"User-Agent": "Magic-Browser"})).read()
memoryFile = StringIO(remoteFile)
pdfFile = PyPDF2.PdfFileReader(memoryFile, strict=False)
num_pages = pdfFile.numPages
count = 0
text = ""
while count < num_pages:
pageObj = pdfFile.getPage(count)
count += 1
text += pageObj.extractText()
我在extractText()
行中不断遇到的错误是这样的:
Traceback (most recent call last):
File "errortest.py", line 30, in <module>
text += pageObj.extractText()
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2595, in extractText
content = ContentStream(content, self.pdf)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2674, in __init__
self.__parseContentStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2706, in __parseContentStream
operands.append(readObject(stream, None))
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 98, in readObject
return NumberObject.readFromStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 231, in __new__
return decimal.Decimal.__new__(cls, str(value))
File "/anaconda2/lib/python2.7/decimal.py", line 547, in __new__
"Invalid literal for Decimal: %r" % value)
File "/anaconda2/lib/python2.7/decimal.py", line 3872, in _raise_error
raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: '99.-72'
如果有人可以帮助我,那将很棒!谢谢!