无法从pdfplumber中的堆栈溢出中恢复

时间:2019-05-26 12:52:27

标签: python python-3.x pdf pdfminer

我正在尝试使用python3和pdfplumber阅读1200页pdf。使用pdfplumber实例化pdf之后,对该实例的任何操作都将引发StackOverflow。 pdfplumber或python中有什么方法可以让我逐部分阅读pdf

我尝试只传递一页,但是在创建pdf实例后,实例的任何操作都会引发Stackoverflow

pdf_instance = pdfplumber.from_path(pdf_path)

# This line throws error
pdf_page = pdf_instance.pages[0]

错误消息看起来像

Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00007f36c68bf700 (most recent call first):
  File "/home/akash/anaconda3/lib/python3.6/logging/__init__.py", line 1546 in isEnabledFor
  File "/home/akash/anaconda3/lib/python3.6/logging/__init__.py", line 1293 in debug
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/psparser.py", line 544 in add_results
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfparser.py", line 69 in do_keyword
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/psparser.py", line 616 in nextobject
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 669 in _getobj_parse
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 691 in getobj
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 71 in resolve
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 84 in resolve1
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdftypes.py", line 164 in dict_value
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 88 in search
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 100 in search
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 100 in search
  File "/home/akash/MAY_23/env_doc/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 100 in search

...

由于实例太大,我需要的解决方案是将pdf分成250-300页的4-5部分,可以在上面进行操作并在以后合并

0 个答案:

没有答案