我需要将.docx文件内容转换为HTML文本才能在web ui中显示。
我使用过 Apache POI 的 XWPFDocument 类,但尚未获得任何结果; 获取空字符串。我的代码基于this sample。
这也是我的代码:
public JSONObject uploadDocxFile(MultipartFile multipartFile) throws Exception {
InputStream inputStream = multipartFile.getInputStream();
XWPFDocument wordDocument = new XWPFDocument(inputStream);
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StringWriter stringWriter = new StringWriter();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, new StreamResult(stringWriter));
out.close();
String result = new String(out.toByteArray());
String htmlText = result;
JSONObject jsonObject = new JSONObject();
jsonObject.put("content", htmlText);
jsonObject.put("success", true);
return jsonObject;
}
答案 0 :(得分:1)
即使为时已晚,我认为以前的代码可以用这种方式修改(它适用于word97文档)
private static void convertWordDoc2HTML(File file)
throws ParserConfigurationException, TransformerConfigurationException,TransformerException, IOException {
//change the type from XWPFDocument to HWPFDocument
HWPFDocument hwpfDocument = null;
try {
FileInputStream fis = new FileInputStream(file);
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
hwpfDocument = new HWPFDocument(fileSystem);
} catch (IOException ex) {
ex.printStackTrace();
}
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
//add processDocument method
wordToHtmlConverter.processDocument(hwpfDocument);
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
String htmlText = result;
System.out.println(htmlText);
}
我希望它有用。
答案 1 :(得分:0)
我正在使用docx4j来执行此操作,它似乎正在运行。如果您使用的是Maven,则只需add the dependency(但使用的是3.0.0版),然后使用名为ConvertOutHtml.java
的{{3}}之一。只需更改ConvertOutHtml.java
中的文件路径即可指向您的文件,您应该没问题。
答案 2 :(得分:0)
您的代码正在生成一个空的html输出,因为您没有处理转换器中的任何文档。
无论如何,如果它是docx,你应该使用XHTMLConverter将其转换为HTML而不是WordToHtmlConverter。见this answer