Question

有谁知道我可以从pdf文件中提取所有jpg图像的方法？我目前正在使用Acrobat，我有一个文件，其中包含我需要提取的大约1500张照片，但一次一个地完成它们会花费太多时间。有任何想法吗？

感谢。

Answer 1

只是做了一点点搜索我发现了这个，我希望它有所帮助......我想不出任何理由在pdf中有1500张图像。

http://pdf-image-extraction-wizard.lastdownload.com/

Answer 2

有免费的实用程序可以帮助您这样做。例如，快速Google搜索出现了this one。

Answer 3

在Mac上尝试应用FileJuicer - 这通常可以很好地从PDF中提取图像

Answer 4

编码答案（需要tesseract（免费软件））。我不确定我实际使用了哪些程序包，在同一代码块中是否有某些程序包用于其他功能。

from PIL import Image
import pytesseract
import cv2
import os
import subprocess

#Strip images and put them in the relevant directory
def image_exporter(pdf_path, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cmd = ['pdfimages', '-all', pdf_path,
           '{}/prefix'.format(output_dir)]
    subprocess.call(cmd)
    print('Images extracted:')
    print(os.listdir(output_dir))

从pdf文件中提取照片

4 个答案: