我正在将Word / .doc文件转换为HTML,我希望能够获得部分页面。是否可以限制输出范围?我愿意从仅包含页面子集的原始文档中创建一个新的HWPFDocument,或者在转换后限制在那里的长度。
File localFile = ...
FileInputStream fis = new FileInputStream(localFile);
HWPFDocument wordDoc = new HWPFDocument(fis);
Document newDoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDoc);
wordToHtmlConverter.processDocument(wordDoc);
StringWriter stringWriter = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.setOutputProperty(OutputKeys.METHOD, "html");
transformer.transform(
new DOMSource(wordToHtmlConverter.getDocument()),
new StreamResult(stringWriter));
String htmlString = stringWriter.toString();
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(htmlFile), "UTF-8"));
out.write(htmlString);
out.close();
答案 0 :(得分:0)
不适用于POI。没有HWPF格式的页面的概念。页面是消费者的产物。在用户呈现页面之前,没有页面,而且每个客户端可以呈现页面的稍有不同,即使在不同版本的Word之间也是如此。