我有一个字符串变量,其中包含格式化的html文本,我必须使用apache-poi将其转换为.doc文件。
我通过将docx4j用于.docx文件来获得此解决方案,但是客户端希望通过使用apache-poi(即将html字符串转换为.doc和.docx的解决方案)来解决该问题。
那么如何在春季启动时使用apache-poi将html文本字符串从格式化的html文本字符串转换为.doc和.docx文件?
编辑:解决方案-
对于文档:
private String getDocHtmlText(byte[] contents)
throws FileNotFoundException, IOException, ParserConfigurationException, TransformerConfigurationException,
TransformerFactoryConfigurationError, TransformerException {
File file = new java.io.File("reportTemplate.doc");
FileUtils.writeByteArrayToFile(file, contents);
InputStream input = new FileInputStream(file);
HWPFDocument wordDocument = new HWPFDocument(input);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
WordToHtmlConverter converter = new WordToHtmlConverter(doc);
converter.processDocument(wordDocument);
ByteArrayOutputStream output = new ByteArrayOutputStream();
try {
DOMSource domSource = new DOMSource(converter.getDocument());
StreamResult streamResult = new StreamResult(output);
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
} finally {
input.close();
output.close();
file.delete();
}
return output.toString();
}
对于Docx:
private String getDocxHtmlText(byte[] contents) throws IOException, FileNotFoundException {
File file = new java.io.File("reportTemplate.docx");
FileUtils.writeByteArrayToFile(file, contents);
InputStream in = new FileInputStream(file);
XWPFDocument document = new XWPFDocument(in);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));
OutputStream out = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, out, options);
in.close();
out.close();
file.delete();
return out.toString();
}