我正在尝试使用Tabula-py从PDF中进行一些数据抓取,但是我无法完成它的工作。我在我的Jupyter笔记本(在Mac上)上运行它:
from tabula import read_pdf
df = read_pdf("/Users/jamesozden/Downloads/pdfminer-20140328/samples/simple1.pdf")
我收到了这个错误:
Error:
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-5-57f646d3a440> in <module>()
----> 1 df = read_pdf("/Users/jamesozden/Downloads/pdfminer-20140328/samples/simple1.pdf")
2 #/Users/jamesozden/Desktop/data_scrape_table.pdf
/Users/jamesozden/anaconda/lib/python2.7/site-packages/tabula/wrapper.pyc in read_pdf(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, **kwargs)
83
84 try:
---> 85 output = subprocess.check_output(args)
86
87 except FileNotFoundError as e:
/Users/jamesozden/anaconda/lib/python2.7/subprocess.pyc in check_output(*popenargs, **kwargs)
217 if cmd is None:
218 cmd = popenargs[0]
--> 219 raise CalledProcessError(retcode, cmd, output=output)
220 return output
221
CalledProcessError: Command '['java', '-jar', '/Users/jamesozden/anaconda/lib/python2.7/site-packages/tabula/tabula-1.0.1-jar-with-dependencies.jar', '--pages', '1', '--guess', '/Users/jamesozden/Downloads/pdfminer-20140328/samples/simple1.pdf']' returned non-zero exit status 1
所以我读了一些关于安装java的问题,我这样做了。我也确定我添加了bash配置文件的路径(我对此比较新,所以我不确定我是否已正确完成)。这是我添加到我的bash.profile的行,以防有人想确保它没问题,这是我使用which java
获得的:
export PATH="$HOME/usr/bin/java/bin:$PATH"
非常感谢任何帮助,谢谢!
答案 0 :(得分:0)
我设法自己排序!事实证明,即使我已经下载了Java的更新版本,版本1.6也是正在使用的版本(这个版本使用java -version看到)。我用自制软件升级到java 8,现在工作正常。