使用Apache POI将文档转换为html剪切最终的html文件

时间:2018-11-20 15:04:28

标签: java apache-poi

我使用apache POI 4.0.0将.doc转换为.html。

    private static String ProcessingDoc(File doc, String imagedir) throws IOException, ParserConfigurationException, TransformerConfigurationException, TransformerFactoryConfigurationError {
    FileInputStream in = new FileInputStream(doc);
    HWPFDocument doc_file = new HWPFDocument(in);

    Document html_file = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

    WordToHtmlConverter converter = new WordToHtmlConverter(html_file);

    converter.setPicturesManager(new PicturesManager() {

        @Override
        public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches,
                float heightInches) {
            File imgFile = new File(getParentDirectory(doc));
            if(!imgFile.exists()){
                imgFile.mkdirs();
            }
            try {
                FileOutputStream out = new FileOutputStream(imagedir+"/" + suggestedName);
                out.write(content);
                out.close();
            } catch (Exception e) {
                e.printStackTrace();
            }

            return suggestedName;
        }
    });

    converter.processDocument(doc_file);
    StringWriter stringWriter = new StringWriter();
    Transformer transformer;
    transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
    transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
    transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    try {
        transformer.transform(
                new DOMSource( converter.getDocument() ),
                new StreamResult( stringWriter ) );
    } catch (TransformerException e) {
        e.printStackTrace();
    }
    return stringWriter.toString();
}

}

但是POI会创建一些不完整的html文件,并在文件的不同位置剪切。 它看起来像:

<some text of html document>
                <tr class="r1">
                    <td class="td49">
                        <p class="p17"></p>
                    </td><td class="td50">
                        <p class="p17"></p>
                    </td><td class="td51">

其html文件的结尾。 转换过程中没有错误。

为什么我没有错误并且文件不完整?

感谢您的回答!

0 个答案:

没有答案