Question

我正在编码一些东西，这些东西将在线阅读PDF并返回在文档中找到的一组关键字。但是，我仍然遇到PyPDF2软件包中的extractText()函数的问题。

这是我打开PDF并阅读的代码：

x = myurl.pdf
if ".pdf" in x:
remoteFile = urlopen(Request(x, headers={"User-Agent": "Magic-Browser"})).read()
memoryFile = StringIO(remoteFile)
pdfFile = PyPDF2.PdfFileReader(memoryFile, strict=False)
num_pages = pdfFile.numPages
count = 0
text = ""
while count < num_pages:
    pageObj = pdfFile.getPage(count)
    count += 1
    text += pageObj.extractText()

我在extractText()行中不断遇到的错误是这样的：

Traceback (most recent call last):
  File "errortest.py", line 30, in <module>
    text += pageObj.extractText()
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2595, in extractText
    content = ContentStream(content, self.pdf)
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2674, in __init__
    self.__parseContentStream(stream)
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2706, in __parseContentStream
    operands.append(readObject(stream, None))
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 98, in readObject
    return NumberObject.readFromStream(stream)
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 271, in readFromStream
    return FloatObject(num)
  File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 231, in __new__
    return decimal.Decimal.__new__(cls, str(value))
  File "/anaconda2/lib/python2.7/decimal.py", line 547, in __new__
    "Invalid literal for Decimal: %r" % value)
  File "/anaconda2/lib/python2.7/decimal.py", line 3872, in _raise_error
    raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: '99.-72'

如果有人可以帮助我，那将很棒！谢谢！

.extractText（）返回“十进制无效的文字”

0 个答案: