Question

我正在使用org.apache.poi.xwpf.converter.xhtml.XHTMLConverter类将docx转换为html。下面是我的常规代码

public Map convert(String wordDocPath, String htmlPath,
        Map conversionParams)
{
    log.info("Converting word file "+wordDocPath)
    try
    {
        ...
        String notificationWorkingFolder = "C:\tomcats\Notification\store\Notification1234"

        FileInputStream fis = new FileInputStream(wordDocPath);
        XWPFDocument document = new XWPFDocument(fis);
        XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File(notificationWorkingFolder)));
        File htmlFile = new File(htmlPath);
        OutputStream out = new FileOutputStream(htmlFile)
        XHTMLConverter.getInstance().convert(document, out, options);

        log.info("Converted to HTML file "+htmlPath)

        return [success:true,htmlFileName:getFileName(htmlPath)]
    }
    catch(Exception e)
    {
        log.error("Exception :"+e.getMessage(),e)
        return [success:false]
    }

}

以上代码正在成功将docx转换为html，但如果docx包含任何图像，则会放置<img src="C:\tomcats\Notification\store\Notification1234\word\media\image1.png">，但不会将图像复制到该文件夹。因此，当我打开html标签时，所有图像都显示为空。我在代码中遗漏了什么吗？有没有办法生成图像srouce链接而不是绝对路径，如<img src="http://localhost:8080/webapp/image1.png">

Answer 1

我从这个链接lychaox.com/java/poi/Word07toHtml.html得到第一个问题的答案。我必须添加一行代码options.setExtractor(new FileImageExtractor(imageFolderFile));来生成图像。我通过模式搜索和替换解决了第二个问题。

Answer 2

即使使用得当，也应注意XHTMLConverter使用XHTMLMapper，它不会处理headers，footers或VML Images。任何属于这些类别的图像都将丢失。

PDFConverter具有更全面的功能，但也使用GPL许可的库iText。

org.apache.poi.xwpf.converter.xhtml.XHTMLConverter不生成图像

2 个答案: