我已经尝试过文档中找到的每个命令,怎么才能只将文本部分作为输出,而不是所有图像?
https://github.com/coolwanglu/pdf2htmlEX/wiki/Command-Line-Options。
答案 0 :(得分:0)
我不确定您要达到的目的,因为问题的主题和细节看起来是矛盾的,但是有一些选项可以将图形和文本分成单独的文件:
--embed <string>
--embed-css <0|1> (Default: 1)
--embed-font <0|1> (Default: 1)
--embed-image <0|1> (Default: 1)
--embed-javascript <0|1> (Default: 1)
--embed-outline <0|1> (Default: 1)
Specify which elements should be embedded into the output HTML
file.
If switched off, separated files will be generated along with
the HTML file for the corresponding elements.
--embed accepts a string as argument. Each letter of the string
must be one of `cCfFiIjJoO`, which corresponds to one of the
--embed-*** switches. Lower case letters for 0 and upper case
letters for 1. For example, `--embed cFIJo` means to embed
everything but CSS files and outlines.
--split-pages <0|1> (Default: 0)
If turned on, the content of each page is stored in a separated
file.
This switch is useful if you want pages to be loaded separately
& dynamically -- a supporting server might be necessary.
Also see --page-filename.
因此,如果您使用--split-pages 1
和--embed-image 0
选项,则每个PDF页面只有一个HTML页面,其中不包括嵌入式图像。
如果这不是您想要的,请在问题中添加其他信息。