Question

我正在实现poppler pdftohtml 方法，以将pdf转换为html。我正在尝试通过python运行exec文件。

import subprocess

subprocess.Popen([r"D:/poppler-0.68.0/bin/pdftohtml.exe" , 'name.pdf', 'name.html'])

使用以上代码，我将获取html文件以及pdf中每一页的图像（.jpg）。

我只需要html文件而不需要图像。我应该进行/添加哪些更改/参数才能获得预期的结果？

Answer 1

根据their documentation，可能有两个选项可以帮助您解决此问题：

-i ignore images

和

-s generate single HTML that includes all pages

如果这些都不起作用，则您无能为力。