Question

我正在尝试使用Python的库pdf2image将我的pdf文件转换为png文件。我使用以下代码转换我的pdf文件。

from pdf2image import convert_from_path, convert_from_bytes
pdf_file_path = './samples/my_pdf.pdf'
images = convert_from_path(pdf_file_path)

我想这样做，以便稍后使用pytesseract将我的pdf文件转换为字符串文本。

我一直遇到的问题是以下FileNotFound错误，即使该文件位于正确的路径中。谁能帮我弄清楚我做错了什么？

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-9-0b7f9e29e79a> in <module>()
      1 from pdf2image import convert_from_path, convert_from_bytes
      2 pdf_file_path = './samples/my_pdf.pdf'
----> 3 images = convert_from_path(pdf_file_path)

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\site-packages\pdf2image\pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt)
     22     uid, args, parse_buffer_func = __build_command(['pdftoppm', '-r', str(dpi), pdf_path], output_folder, first_page, last_page, fmt)
     23 
---> 24     proc = Popen(args, stdout=PIPE, stderr=PIPE)
     25 
     26     data, err = proc.communicate()

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    707                                 c2pread, c2pwrite,
    708                                 errread, errwrite,
--> 709                                 restore_signals, start_new_session)
    710         except:
    711             # Cleanup if the child failed starting.

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
    995                                          env,
    996                                          os.fspath(cwd) if cwd is not None else None,
--> 997                                          startupinfo)
    998             finally:
    999                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

Answer 1

对不起，我的回复很晚。

原因

在深入研究pdf2image的源代码之后，错误是由pdfinfo包内的{nix基本命令pdf2image引起的。结果，当您在缺少pdfinfo命令的Windows上使用此软件包时，将导致上述错误。

来自pdf2image的代码：

#inside __page_count() function
    ...
    else:
        proc = Popen(["pdfinfo", pdf_path], stdout=PIPE, stderr=PIPE)
    ...

从上面的代码中，您可以看到它调用了pdfinfo的子进程来获取pdf文件的页数。

解决方案

从：http://blog.alivate.com.au/poppler-windows/

下载窗口版本poppler工具

将其解压缩并将bin的位置（例如C：\ somepath \ poppler-0.67.0_x86 \ poppler-0.67.0 \ bin）添加到您的环境PATH。

如果要打开，请重新启动CMD和python virtualenv

Answer 2

尝试使用完整路径。

<强>实施例

import os
basePath = os.path.dirname(os.path.realpath(__file__))
pdf_file_path = os.path.join(basePath, "samples/my_pdf.pdf")
images = convert_from_path(pdf_file_path)

Answer 3

如果您使用Google colab

首先使用以下命令运行单元格：

!apt-get install poppler-utils

这是一个完整的示例笔记本，可以安装dep，下载示例PDF，然后使用pdf2image将其转换为图像以进行显示。

https://colab.research.google.com/drive/10doc9xwhFDpDGNferehBzkQ6M0Un-tYq

Answer 4

我在运行Python 2时遇到了这个问题。

再次查看之后，pypi页面明确指出代码与Python 2不兼容。

使用包pdf2image的convert_from_path（）函数时的FileNotFoundError

4 个答案:

原因

解决方案