将doc转换为带有样式的html

时间:2013-01-22 12:20:26

标签: html xhtml html-parsing apache-poi doc

使用 Apache POI HWPFDocument .doc 文件转换为 html 文本时,我遇到了样式问题>上课。另一个问题是它转换样式标签的方式如下:

.b1{white-space-collapsing:preserve;} .b2{margin: 1.1798611in 1.1798611in 1.1798611in 1.1798611in;} .s1{font-weight:bold;color:black;} .s2{color:black;} .s3{font-style:italic;color:black;} .p1{text-align:center;hyphenate:none;font-family:Times New Roman;font-size:12pt;} .p2{text-align:justify;hyphenate:none;font-family:Times New Roman;font-size:12pt;} .p3{text-align:end;hyphenate:none;font-family:Times New Roman;font-size:12pt;}

Main Title

这是我的代码:

HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(multipartFile.getInputStream());

WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
wordToHtmlConverter.processDocument(wordDocument);
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();

String html = new String(out.toByteArray());

我只需要将.doc文件的内容正确地转换为HTML文本格式。

0 个答案:

没有答案