我正在尝试从Apache tika库中读取数据以解析pdf文件。我使用Python 3通过pip install tika
安装了它。
代码:
from tika import parser
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
或
from tika import parser
parsedPDF = parser.from_file("test.pdf")
错误:
Traceback (most recent call last):
File "tikaparsing-test.py", line 2, in <module>
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\parser.py", line 36, in from_file
jsonOutput = parse1('all', filename, serverEndpoint, headers=headers)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 316, in parse1
headers, verbose, tikaServerJar, rawResponse=rawResponse)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 510, in callServer
serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 565, in checkTikaServer
startServer(jarPath, serverHost, port, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 609, in startServer
cmd = Popen(cmd , stdout= logFile, stderr = STDOUT, shell =True)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access is denied