我正在尝试使用slate
模块从pdf文件中提取文本,如此
$sudo pip install https://codeload.github.com/timClicks/slate/zip/master
Collecting https://codeload.github.com/timClicks/slate/zip/master
Downloading https://codeload.github.com/timClicks/slate/zip/master
Requirement already satisfied: distribute in /usr/lib/python3.5/site-packages (from slate==0.5.2)
Requirement already satisfied: pdfminer3k in /usr/lib/python3.5/site-packages (from slate==0.5.2)
Requirement already satisfied: setuptools>=0.7 in /usr/lib/python3.5/site-packages (from distribute->slate==0.5.2)
Requirement already satisfied: pytest>=2.0 in /usr/lib/python3.5/site-packages (from pdfminer3k->slate==0.5.2)
Requirement already satisfied: ply>=3.4 in /usr/lib/python3.5/site-packages (from pdfminer3k->slate==0.5.2)
Requirement already satisfied: py>=1.4.29 in /usr/lib/python3.5/site-packages (from pytest>=2.0->pdfminer3k->slate==0.5.2)
Installing collected packages: slate
Found existing installation: slate 0.3
Uninstalling slate-0.3:
Successfully uninstalled slate-0.3
Running setup.py install for slate ... done
Successfully installed slate-0.5.2
我正在尝试:
#!/usr/bin/python3
import slate
with open('/var/tmp/PhysRevB.93.014203.pdf') as fp:
doc = slate.PDF(fp)
print(len(doc))
print(doc[0])
这给了我错误:
$python3 tstslt.py
Traceback (most recent call last):
File "tstslt.py", line 2, in <module>
import slate
File "/usr/lib/python3.5/site-packages/slate/__init__.py", line 66, in <module>
from .classes import PDF
File "/usr/lib/python3.5/site-packages/slate/classes.py", line 25, in <module>
import utils
ImportError: No module named 'utils'
我可以使用PyPDF2
提取文字,但要查看平板是否更好。
答案 0 :(得分:1)
根据this issue一个slate的dependecies(pdfminer)不支持Python3
(...)
所需的“pdfminer”不起作用,因为它当前 与python 3.5不兼容。
在他们的自述文件中说明了这一点:
https://github.com/euske/pdfminer
“安装Python 2.6或更新版本。(不支持Python 3。)”
答案 1 :(得分:0)
答案 2 :(得分:0)
安装 slate3k 后,还必须设置模式以及如何打开文件:
#/usr/bin/python3
import slate
with open('/var/tmp/PhysRevB.93.014203.pdf', 'rb') as fp:
doc = slate.PDF(fp)
print(len(doc))
print(doc[0])