Question

我正在尝试使用pdf2image，似乎我需要一个叫做propeller的东西：

(sum_env) C:\Users\antoi\Documents\Programming\projects\summarizer>python ocr.py -i fr13_idf.pdf
Traceback (most recent call last):
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 165, in __page_count
    proc = Popen(["pdfinfo", pdf_path], stdout=PIPE, stderr=PIPE)
  File "C:\Python37\lib\subprocess.py", line 769, in __init__
    restore_signals, start_new_session)
  File "C:\Python37\lib\subprocess.py", line 1172, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ocr.py", line 53, in <module>
    pdfspliterimager(image_path)
  File "ocr.py", line 32, in pdfspliterimager
    pages = convert_from_path("document-page%s.pdf" % i, 500)
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 30, in convert_from_path
    page_count = __page_count(pdf_path, userpw)
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 169, in __page_count
    raise Exception('Unable to get page count. Is poppler installed and in PATH?')
Exception: Unable to get page count. Is poppler installed and in PATH?

我尝试了this link，但是下载的东西并没有解决我的问题。

Answer 1

pdf2image只是poppler的包装（不是螺旋桨！），要使用该模块，您需要在计算机上和路径中安装poppler-utils。

该过程链接在“如何安装”部分的project's README中。

Answer 2

这些pdf2image和pdftotext库后端要求是Poppler，所以你必须安装

'conda install -c conda-forge poppler'

然后该错误将得到解决。如果仍然不适合您，则可以按照 http://blog.alivate.com.au/poppler-windows/安装此库。

Answer 3

对于窗户；解决PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?：

安装 chocolatey https://chocolatey.org/install
然后使用 choco 安装 poppler： choco install poppler

Answer 4

首先从这里here下载 Poppler ，然后将其解压缩。在代码部分中，添加 poppler_path = r'C：\ Program Files \ poppler-0.68。 0 \ bin'（例如，如下所示）

Struct

现在完成了。借助此技巧，无需添加环境变量。如果您有任何问题，请告诉我。

Answer 5

是poppler没有正确安装。使用它，您可以获得正确的安装包。

sudo apt-get install poppler-utils

Answer 6

我有同样的问题，但是我已经通过更改目录在django项目中修复了它。实际上，首先您需要将此pdf图像文件存储在媒体目录的旁边。然后，您需要将当前目录更改为该媒体目录（已存储该pdf图像文件的位置）。这是我在Django项目中将.pdf图像转换为.jpg

的代码段

import PIL
from PIL import Image

def convert_pdf_2_image(uploaded_image_path, uploaded_image,img_size):
    project_dir = os.getcwd()
    os.chdir(uploaded_image_path)
    file_name = str(uploaded_image).replace('.pdf','')
    output_file = file_name+'.jpg'
    pages = convert_from_path(uploaded_image, 200)
    for page in pages:
        page.save(output_file, 'JPEG')
        break
    os.chdir(project_dir)
    img = Image.open(output_file)
    img = img.resize(img_size, PIL.Image.ANTIALIAS)
    img.save(output_file)
    return output_file

Answer 7

我正在使用 Visual Studio Code 处理 Mac，但遇到此错误。我按照安装说明操作，并能够验证软件包已安装，但在 VSC 中运行时错误仍然存在。

即使我在我的 python.condaPath 中指定了我的 python.pythonPath 和 settings.json，但直到激活 VSC 集成终端本身内部的 conda 环境后才开始

conda activate my_env

错误消失了..

奇怪。

Answer 8

下载 poppler 后执行此操作.... 导入操作系统 os.environ["PATH"] = r"C:.....\poppler-xxxxxxx\bin" 用它来创造环境希望它有效。它对我有用。

Answer 9

在 Windows 中

安装适用于 Windows 的 Poppler Poppler

500 = JPG 的质量
路径包含pdf文件

pip 安装 pdf2img

 path = r'C:\ABC\FEF\KLH\pdf_extractor\output\break'

 def spliting_pdf2img( path):
     from pdf2image import convert_from_path, convert_from_bytes
     for file in os.listdir(path):
         if file.lower().endswith(".pdf"):
             pages = convert_from_path(os.path.join(path,file), 500,poppler_path= r'C:\ABC\DEF\Downloads\poppler-0.68.0\bin')
             for page in pages:                    
                 page.save(os.path.join(path,file.lower().replace(".pdf",".jpg")),'JPEG')

在 Linux/UBUNTU 中 在 ubuntu/linux 终端安装以下包

sudo apt-get 更新

sudo apt-get install poppler-utils

path = r'C:\ABC\FEF\KLH\pdf_extractor\output\break'

 def spliting_pdf2img( path):
     from pdf2image import convert_from_path, convert_from_bytes
     for file in os.listdir(path):
         if file.lower().endswith(".pdf"):
             pages = convert_from_path(os.path.join(path,file), 500)
             for page in pages:                    
                 page.save(os.path.join(path,file.lower().replace(".pdf",".jpg")),'JPEG')

螺旋桨在pdf2image的路径中

9 个答案: