尝试使用tabula提取表,但由于它最初是在java中编写的,因此它尝试访问java运行时。我在我的mac os上安装了java,但我猜它没有在Pycharm上配置。因此,当我运行tabula时,我得到以下错误:
No Java runtime present, requesting install.
Error:
Traceback (most recent call last):
File "/Users/rohank2/salesorderautomation/test.py", line 138, in <module>
text = pdf_to_text(filename) # call to pdftotext function
File "/Users/rohank2/salesorderautomation/test.py", line 53, in pdf_to_text
df = read_pdf(filename)
File "/Users/rohank2/Library/Python/3.6/lib/python/site-packages/tabula/wrapper.py", line 87, in read_pdf
output = subprocess.check_output(args)
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-jar', '/Users/rohank2/Library/Python/3.6/lib/python/site-packages/tabula/tabula-1.0.2-jar-with-dependencies.jar', '--pages', '1', '--guess', './testdataset/test12.pdf']' returned non-zero exit status 1.
Process finished with exit code 1
这就是我尝试访问数据的方式:
def pdf_to_text(pdfname):
# PDFMiner boilerplate
df = read_pdf(filename)
print(list(df))