无法使用Java将docx转换为html

时间:2013-01-22 12:00:00

标签: html apache-poi document docx

我需要将.docx文件内容转换为HTML文本才能在web ui中显示。

我使用过 Apache POI XWPFDocument 类,但尚未获得任何结果; 获取空字符串。我的代码基于this sample

这也是我的代码:

public JSONObject uploadDocxFile(MultipartFile multipartFile) throws Exception {
        InputStream inputStream = multipartFile.getInputStream();
        XWPFDocument wordDocument = new XWPFDocument(inputStream);

        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StringWriter stringWriter = new StringWriter();

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, new StreamResult(stringWriter));
        out.close();

        String result = new String(out.toByteArray());
        String htmlText = result;

        JSONObject jsonObject = new JSONObject();
        jsonObject.put("content", htmlText);
        jsonObject.put("success", true);
        return jsonObject;
    }

3 个答案:

答案 0 :(得分:1)

即使为时已晚,我认为以前的代码可以用这种方式修改(它适用于word97文档)

    private static void convertWordDoc2HTML(File file)
    throws ParserConfigurationException, TransformerConfigurationException,TransformerException, IOException {       
    //change the type from XWPFDocument to HWPFDocument
    HWPFDocument hwpfDocument = null;
    try {
        FileInputStream fis = new FileInputStream(file);
        POIFSFileSystem fileSystem = new POIFSFileSystem(fis);          
             hwpfDocument = new HWPFDocument(fileSystem);

    } catch (IOException ex) {
        ex.printStackTrace();
    }

    WordToHtmlConverter wordToHtmlConverter = new   WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    //add processDocument method 
    wordToHtmlConverter.processDocument(hwpfDocument);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());

    String htmlText = result;
    System.out.println(htmlText);

    }

我希望它有用。

答案 1 :(得分:0)

我正在使用docx4j来执行此操作,它似乎正在运行。如果您使用的是Maven,则只需add the dependency(但使用的是3.0.0版),然后使用名为ConvertOutHtml.java的{​​{3}}之一。只需更改ConvertOutHtml.java中的文件路径即可指向您的文件,您应该没问题。

答案 2 :(得分:0)

您的代码正在生成一个空的html输出,因为您没有处理转换器中的任何文档。

无论如何,如果它是docx,你应该使用XHTMLConverter将其转换为HTML而不是WordToHtmlConverter。见this answer