Pdf2htmlEx:HTML包含图像,我怎么能将图形而不是图像作为输出?

时间:2018-10-25 07:20:48

标签: pdf2htmlex

我已经尝试过文档中找到的每个命令,怎么才能只将文本部分作为输出,而不是所有图像?

https://github.com/coolwanglu/pdf2htmlEX/wiki/Command-Line-Options

1 个答案:

答案 0 :(得分:0)

我不确定您要达到的目的,因为问题的主题和细节看起来是矛盾的,但是有一些选项可以将图形和文本分成单独的文件:

--embed <string>
   --embed-css <0|1> (Default: 1)
   --embed-font <0|1> (Default: 1)
   --embed-image <0|1> (Default: 1)
   --embed-javascript <0|1> (Default: 1)
   --embed-outline <0|1> (Default: 1)
          Specify which elements should be embedded into the  output  HTML
          file.

          If  switched  off,  separated files will be generated along with
          the HTML file for the corresponding elements.

          --embed accepts a string as argument. Each letter of the  string
          must  be  one  of  `cCfFiIjJoO`, which corresponds to one of the
          --embed-*** switches. Lower case letters for 0  and  upper  case
          letters  for  1.  For  example,  `--embed  cFIJo` means to embed
          everything but CSS files and outlines.

   --split-pages <0|1> (Default: 0)
          If turned on, the content of each page is stored in a  separated
          file.

          This  switch is useful if you want pages to be loaded separately
          & dynamically -- a supporting server might be necessary.

          Also see --page-filename.

因此,如果您使用--split-pages 1--embed-image 0选项,则每个PDF页面只有一个HTML页面,其中不包括嵌入式图像。

如果这不是您想要的,请在问题中添加其他信息。