Question

[更新]

我在Windows上安装了PyPDFOCR以及Tesseract，ImageMagick和Poppler。

在Python中，PDF文件会转换，但我收到警告 -

WARNING: Could not execute identify to calculate DIP (try installing imagemagic?), so defaulting to 300dpi

我在C：\ Program Files \ ImageMagick-7.0.7-Q16中安装了imagemagick 7.07，我的运行时Python路径将其添加到路径中：

 os.environ['PATH'] += os.pathsep + 'C:\\Program Files\\ImageMagick-7.0.7-Q16'

应用程序identify位于C：\ Program Files \ ImageMagick-7.0.7-Q16。

通过更改pypdfocr_gs.py中的第146行来阅读 cmd = 'magick identify -format "%%w %%x %%h %%y" "%s"' % pdf_filename

返回非空results（pypdfocr_gs.py中的第149-150行）。

但是，如果C:\myfile.pdf中有多个页面，则脚本在第151行失败：

 width, xdensity, height, ydensity = [float(x) for x in results.split()]

因为它需要4个元组才能解压缩，但它有4倍，但有很多页面。