Question

我正在尝试从一系列PDF文件中提取表格，但无法使tabula-py工作。我一直试图通过Windows操作系统上的Jupyter笔记本来使用它。不幸的是，我得到了相同的

“FileNotFoundError”

每次我尝试使用read_PDF（）。

从目前为止我在网上找到的，这个错误似乎是在尝试运行Tabula java文件时产生的。我已经正确安装了java。

对此的任何帮助将不胜感激。

这是我正在尝试运行的代码：

    from tabula import read_pdf
    df = read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

错误讯息：

    FileNotFoundError                         Traceback (most recent call last)
    <ipython-input-78-956ad4697ff7> in <module>()
          1 from tabula import read_pdf
    ----> 2 df = read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

    C:\Program Files\Anaconda3\lib\site-packages\tabula\wrapper.py in read_pdf(input_path, **kwargs)
         64 
         65     try:
    ---> 66         output = subprocess.check_output(args)
         67     finally:
         68         if is_url:

    C:\Program Files\Anaconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
        624 
        625     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    --> 626                **kwargs).stdout
        627 
        628 

    C:\Program Files\Anaconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
        691         kwargs['stdin'] = PIPE
        692 
    --> 693     with Popen(*popenargs, **kwargs) as process:
        694         try:
        695             stdout, stderr = process.communicate(input, timeout=timeout)

    C:\Program Files\Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds)
        945                                 c2pread, c2pwrite,
        946                                 errread, errwrite,
    --> 947                                 restore_signals, start_new_session)
        948         except:
        949             # Cleanup if the child failed starting.

    C:\Program Files\Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
       1222                                          env,
       1223                                          cwd,
    -> 1224                                          startupinfo)
       1225             finally:
       1226                 # Child is launched. Close the parent's copy of those pipe

Answer 1

我在没有为java.exe设置PATH环境的情况下重现此问题。确保为Java设置PATH。也可以看看： https://www.java.com/en/download/help/path.xml

Answer 2

要确保Windows可以找到Java编译器和解释器：选择开始 - ＆gt;计算机 - ＆gt;系统属性 - ＆gt;高级系统设置 - ＆gt;环境变量 - ＆gt;系统变量 - ＆gt;路径。 [在Vista中，选择开始 - ＆gt;我的电脑 - ＆gt;属性 - ＆gt;高级 - ＆gt;环境变量 - ＆gt;系统变量 - ＆gt;路径。 ]

[在Windows XP中，选择开始 - ＆gt;控制面板 - ＆gt;系统 - ＆gt;高级 - ＆gt;环境变量 - ＆gt;系统变量 - ＆gt;路径。 ]

Prepend C：\ Program Files \ Java \ jdk1.6.0_27 \ bin;到PATH变量的开头。单击“确定”三次。

Answer 3

df = tabula.read_pdf("Danamon - FS - FY-18.pdf", pages = i, guess = False)

使用附加参数guess = False

对我有用

Answer 4

我在Ubuntu中也遇到了同样的问题。

首先，通过运行java --version和javac --version检查计算机上安装的JDK和JRE的版本。每个版本的版本都应大于7。

然后使用pip3安装表格。

Answer 5

我这样做，然后运行代码

pip install tabula-py
conda install tabula-py
conda install java
from tabula import read_pdf
import pandas as pd
dt = read_pdf('file.pdf', encoding = 'latin1', pages ='all', nospreadsheet = True)

祝你好运。

Answer 6

因为您无法找到PDF文件。因此，首先通过提供绝对或相对路径来查找PDF文件。

import os
print(os.getcwd())
print(os.listdir('Other'))

#Locating PDF file with searching and then joining absolute and relative path
file = os.path.join(os.getcwd(), 'test.pdf')

Python tabula-py不会读取pdf

6 个答案: