我正在尝试使用evaluate [Table_1]
库解析pdf文件但是我遇到了这个复杂的错误
tika
代码
Traceback (most recent call last):
File "/home/olivia/.local/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/home/olivia/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/home/olivia/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
使用其包装器
时会出现相同的错误import tika
from tika import parser
parsed = parser.from_file('simple1.pdf')
print(parsed["content"])
详细错误see
答案 0 :(得分:0)
请在pdf名称中指定完整路径,并为例如使用斜杠
from tika import parser
parsedPDF=parser.from_file('C:/Users/xyzuser/Documents/abc.pdf')
parsedPDF
答案 1 :(得分:0)
从[https://tika.apache.org/download.html]下载tika罐(tika-app.jar,tika-server.jar和tika-server.jar.md5)[1]
将这些jar(重命名为:tika-app.jar,tika-server.jar和tika-server.jar.md5)保留在Linux和C语言的 / tmp 文件夹中:\ Users <用户> \ AppData \ Local \ Temp \(对于Windows)
from tika import parser
parsedPDF = parser.from_file("/path/to/file/my_pdf.pdf")
print(parsedPDF["metadata"])
print(parsedPDF["content"].encode('ascii', errors='ignore')
答案 2 :(得分:-1)
您只需要对代码进行如下小的修改:
parsed = parser.from_file('simple1.pdf','http://localhost:9998/tika')
为我工作,希望也为您工作:)