如何使用Apache POI将altChunk元素添加到XWPFDocument

时间:2018-11-30 19:51:43

标签: java apache-poi

我想使用Apache POI将HTML作为altChunk添加到DOCX文件。我知道doc4jx可以使用更简单的API来做到这一点,但是出于技术原因,我需要使用Apache POI。

使用CT类对xml进行低级操作有点棘手。我可以使用以下代码创建altChunk:

import java.io.File;
import java.io.FileOutputStream;

import javax.xml.namespace.QName;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;

public class AltChunkTest {
    public static void main(String[] args) throws Exception  {
        XWPFDocument doc = new XWPFDocument();
        doc.createParagraph().createRun().setText("AltChunk below:");
        QName ALTCHUNK =  new QName ( "http://schemas.openxmlformats.org/wordprocessingml/2006/main" ,  "altChunk" ) ; 
        CTDocument1 ctDoc = doc.getDocument() ; 
        CTBodyImpl ctBody =  (CTBodyImpl) ctDoc. getBody(); 
        XmlComplexContentImpl xcci =  ( XmlComplexContentImpl ) ctBody.get_store().add_element_user(ALTCHUNK); 
        // what's need to now add "<b>Hello World!</b>"
        FileOutputStream out = new FileOutputStream(new File("test.docx"));
        doc.write(out);
    }
}

但是现在如何将html内容添加到'xcci'中?

3 个答案:

答案 0 :(得分:3)

Office Open XML的{​​{1}}(Word)中,*.docx提供了一种使用纯altChunk来描述文档部分的方法。

关于HTML的两个重要说明:

第一:它仅用于导入内容。如果使用altChunk打开文档并保存,则新保存的文档将不包含替代格式内容部分,也不会包含引用该文档的altChunk标记。 Word将所有导入的内容保存为默认的Word元素。

第二:除了Office Open XML以外,大多数其他能够读取Word的应用程序也将完全不读取*.docx的内容。例如,altChunkLibreoffice OpenOffice读取Writer的内容,而altChunk在打开apache poi文件时阅读altChunk内容。

*.docxaltChunk *.docx文件结构中如何实现?

ZIP /word/*.html文件中有*.docx个文件。例如,ZIP中的ID将其引用为/word/document.xml。 Id和<w:altChunk r:id="htmlDoc1"/>文件之间的关系在/word/*.html中以/word/_rels/document.xml.rels的形式给出。

因此,对于<Relationship Id="htmlDoc1" Target="htmlDoc1.html" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk"/>文件,我们首先需要POIXMLDocumentPart,对于Ids与/word/*.html文件之间的关系,我们需要POIXMLRelation。下面的代码通过提供一个包装类来提供该包装类,该包装类扩展了* .docx ZIP归档文件中/word/*.html文件的POIXMLDocumentPart。这也提供了处理HTML的方法。它还提供了一种在* .docx ZIP归档文件中创建/word/htmlDoc#.html文件并与其建立关系的方法。

代码:

/word/htmlDoc#.html

注意:由于使用了import java.io.*; import org.apache.poi.*; import org.apache.poi.ooxml.*; import org.apache.poi.openxml4j.opc.*; import org.apache.poi.xwpf.usermodel.*; public class CreateWordWithHTMLaltChunk { //a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive //String id will be htmlDoc#. private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception { OPCPackage oPCPackage = document.getPackage(); PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html"); PackagePart part = oPCPackage.createPart(partName, "text/html"); MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id); document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument); return myXWPFHtmlDocument; } public static void main(String[] args) throws Exception { XWPFDocument document = new XWPFDocument(); XWPFParagraph paragraph; XWPFRun run; MyXWPFHtmlDocument myXWPFHtmlDocument; paragraph = document.createParagraph(); run = paragraph.createRun(); run.setText("Default paragraph followed by first HTML chunk."); myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1"); myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>", "<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>")); document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId()); paragraph = document.createParagraph(); run = paragraph.createRun(); run.setText("Default paragraph followed by second HTML chunk."); myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc2"); myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>", "<body>" + "<table>"+ "<caption>A table></caption>" + "<tr><th>Name</th><th>Date</th><th>Amount</th></tr>" + "<tr><td>John Doe</td><td>2018-12-01</td><td>1,234.56</td></tr>" + "</table>" + "</body>" )); document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId()); FileOutputStream out = new FileOutputStream("CreateWordWithHTMLaltChunk.docx"); document.write(out); out.close(); document.close(); } //a wrapper class for the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive //provides methods for manipulating the HTML //TODO: We should *not* using String methods for manipulating HTML! private static class MyXWPFHtmlDocument extends POIXMLDocumentPart { private String html; private String id; private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception { super(part); this.html = "<!DOCTYPE html><html><head><style></style><title>HTML import</title></head><body></body>"; this.id = id; } private String getId() { return id; } private String getHtml() { return html; } private void setHtml(String html) { this.html = html; } @Override protected void commit() throws IOException { PackagePart part = getPackagePart(); OutputStream out = part.getOutputStream(); Writer writer = new OutputStreamWriter(out, "UTF-8"); writer.write(html); writer.close(); out.close(); } } //the XWPFRelation for /word/htmlDoc#.html private final static class XWPFHtmlRelation extends POIXMLRelation { private XWPFHtmlRelation() { super( "text/html", "http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk", "/word/htmlDoc#.html"); } } } ,因此此代码需要apache poi faq-N10025中提到的所有模式altChunk的完整jar。

结果:

enter image description here

答案 1 :(得分:1)

基于Axel Richter's answer,我用CTBodyImpl.get_store()。add_element_user(QName)替换了对CTBody.addNewAltChunk()的调用,从而消除了对ooxml-schemas的15MB依赖性。由于这是在桌面应用程序中使用的,因此我们试图使应用程序的尺寸尽可能小。如果这可能对其他人有帮助:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;

import javax.xml.namespace.QName;

import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackagePartName;
import org.apache.poi.openxml4j.opc.PackagingURIHelper;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.xmlbeans.SimpleValue;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;

public class AltChunkTest {
    public static void main(String[] args) throws Exception  {
        XWPFDocument doc = new XWPFDocument();
        doc.createParagraph().createRun().setText("AltChunk below:");
        addHtml(doc,"chunk1","<!DOCTYPE html><html><head><style></style><title></title></head><body><b>Hello World!</b></body></html>");
        FileOutputStream out = new FileOutputStream(new File("test.docx"));
        doc.write(out);
    }

    static void addHtml(XWPFDocument doc, String id,String html) throws Exception {
        OPCPackage oPCPackage = doc.getPackage();
        PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
        PackagePart part = oPCPackage.createPart(partName, "text/html");
        class HtmlRelation extends POIXMLRelation {
            private HtmlRelation() {
                super(  "text/html",
                        "http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
                        "/word/htmlDoc#.html");
            }
        }
        class HtmlDocumentPart extends POIXMLDocumentPart {
            private HtmlDocumentPart(PackagePart part) throws Exception {
                super(part);
            }

            @Override
            protected void commit() throws IOException {
                try (OutputStream out = part.getOutputStream()) {
                    try (Writer writer = new OutputStreamWriter(out, "UTF-8")) {
                        writer.write(html);
                    }
                }
            }
        };
        HtmlDocumentPart documentPart = new HtmlDocumentPart(part);
        doc.addRelation(id, new HtmlRelation(), documentPart);
        CTBodyImpl b = (CTBodyImpl) doc.getDocument().getBody();
        QName ALTCHUNK = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "altChunk");
        XmlComplexContentImpl altchunk = (XmlComplexContentImpl) b.get_store().add_element_user(ALTCHUNK);
        QName ID = new QName("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id");
        SimpleValue target = (SimpleValue)altchunk.get_store().add_attribute_user(ID);
        target.setStringValue(id);
    }
}

答案 2 :(得分:0)

在poi-ooxml 4.0.0中此功能是可以的,其中POIXMLDocumentPart和POIXMLRelation类位于软件包org.apache.poi.ooxml中。*

import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;

但是我们如何在poi-ooxml 3.9中使用它,该类与org.apache.poi几乎没有什么不同。*

import org.apache.poi.POIXMLDocumentPart;
import org.apache.poi.POIXMLRelation;