Apache Tika Server问题,无法读取PDF文件

时间:2018-06-22 14:22:00

标签: python-3.x apache-tika

我正在尝试从Apache tika库中读取数据以解析pdf文件。我使用Python 3通过pip install tika安装了它。

代码:

from tika import parser
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')

from tika import parser
parsedPDF = parser.from_file("test.pdf")

错误:

Traceback (most recent call last):
  File "tikaparsing-test.py", line 2, in <module>
    parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\parser.py", line 36, in from_file
    jsonOutput = parse1('all', filename, serverEndpoint, headers=headers)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 316, in parse1
    headers, verbose, tikaServerJar, rawResponse=rawResponse)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 510, in callServer
    serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 565, in checkTikaServer
    startServer(jarPath, serverHost, port, classpath)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 609, in startServer
    cmd = Popen(cmd , stdout= logFile, stderr = STDOUT, shell =True)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
PermissionError: [WinError 5] Access is denied

0 个答案:

没有答案