如何使用WordToHtmlConverter和HWPFDocument限制页面输出?

时间:2018-07-31 19:35:16

标签: java apache apache-poi

我正在将Word / .doc文件转换为HTML,我希望能够获得部分页面。是否可以限制输出范围?我愿意从仅包含页面子集的原始文档中创建一个新的HWPFDocument,或者在转换后限制在那里的长度。

File localFile = ...
FileInputStream fis = new FileInputStream(localFile);
HWPFDocument wordDoc = new HWPFDocument(fis);
Document newDoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDoc);
wordToHtmlConverter.processDocument(wordDoc);

StringWriter stringWriter = new StringWriter();

Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.setOutputProperty(OutputKeys.METHOD, "html");
transformer.transform(
    new DOMSource(wordToHtmlConverter.getDocument()),
                    new StreamResult(stringWriter));

String htmlString = stringWriter.toString();

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(htmlFile), "UTF-8"));
out.write(htmlString);
out.close();

1 个答案:

答案 0 :(得分:0)

不适用于POI。没有HWPF格式的页面的概念。页面是消费者的产物。在用户呈现页面之前,没有页面,而且每个客户端可以呈现页面的稍有不同,即使在不同版本的Word之间也是如此。