Docx4j两个Word文档之间的区别

时间:2014-06-24 11:23:15

标签: java ms-word diff docx4j

我需要检查2个Word docx文件之间的区别。我使用docx4j。 起初我不得不改变SmartXMLFormatter:

    public SmartXMLFormatter(Writer w) throws IOException {
    this.xml = new XMLWriterNSImpl(w, false);
    if (this.writeXMLDeclaration) {
      this.xml.xmlDecl();
      this.writeXMLDeclaration = false;
    }

    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "r");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", "wp");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");

    this.xml.setPrefixMapping(Constants.BASE_NS_URI, "dfx");
    this.xml.setPrefixMapping(Constants.DELETE_NS_URI, "del");
    this.xml.setPrefixMapping(Constants.INSERT_NS_URI, "ins");
  }

在我更改了我的代码而没有俄文字母后,一切正常。 但是当我用俄语字符区分2个docx文档时,会出现以下异常:

    org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Exception in thread "main" javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.]
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    ... 7 more

所以,任何人都可以帮助我吗?

这是主要代码:

    public class CompareDocumentsUsingDriver {

    public static JAXBContext context = org.docx4j.jaxb.Context.jc;

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {
        System.setProperty("file.encoding", "UTF-8");

        String newerfilepath = "B.docx";
        String olderfilepath = "A.docx";

        // 1. Load the Packages
        WordprocessingMLPackage newerPackage = WordprocessingMLPackage
                .load(new java.io.File(newerfilepath));
        WordprocessingMLPackage olderPackage = WordprocessingMLPackage
                .load(new java.io.File(olderfilepath));

        Body newerBody = ((Document) newerPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();
        Body olderBody = ((Document) olderPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();

        System.out.println("Differencing..");

        // 2. Do the differencing
        StringWriter sw = new StringWriter();

        Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
                .getDocumentElement(),
                XmlUtils.marshaltoW3CDomDocument(olderBody)
                        .getDocumentElement(), sw);
        // The signature which takes Reader objects appears to be broken

        // 3. Get the result

        String contentStr = sw.toString();
        System.out.println("Result: \n\n " + contentStr);

        Body newBody = (Body) XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));


        // In the general case, you need to handle relationships. Not done here!

        // RelationshipsPart rp =
        // newerPackage.getMainDocumentPart().getRelationshipsPart();
        // handleRels(pd, rp);
        newerPackage.setFontMapper(new IdentityPlusMapper());
        newerPackage.save(new java.io.File("COMPARED.docx"));

    }

    /**
     * In the general case, you need to handle relationships. Although not
     * necessary in this simple example, we do it anyway for the purposes of
     * illustration.
     */
    private static void handleRels(Differencer pd, RelationshipsPart rp) {
        // Since we are going to add rels appropriate to the docs being
        // compared, for neatness and to avoid duplication
        // (duplication of internal part names is fatal in Word,
        // and export xslt makes images internal, though it does avoid
        // duplicating
        // a part ),
        // remove any existing rels which point to images
        List<Relationship> relsToRemove = new ArrayList<Relationship>();
        for (Relationship r : rp.getRelationships().getRelationship()) {
            // Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
            if (r.getType().equals(Namespaces.IMAGE)) {
                relsToRemove.add(r);
            }
ti      }
        for (Relationship r : relsToRemove) {
            rp.removeRelationship(r);
        }

        // Now add the rels we composed
        List<Relationship> newRels = pd.getComposedRels();
        for (Relationship nr : newRels) {
            rp.addRelationship(nr);
        }
    }

}

致以最诚挚的问候,

编辑:

public static void openResult(String nodename,  Writer out) throws IOException {
        // In general, we need to avoid writing directly to Writer out...
        // since it can happen before formatter output gets there

        // namespaces not properly declared:
        // 4 options:
        // 1:
        // OpenElementEvent containerOpen = new OpenElementEventNSImpl(xml1.getNamespaceURI(), rootNodeName);
        // formatter.format(containerOpen);
        // // AttributeEvent wNS = new AttributeEventNSImpl("http://www.w3.org/2000/xmlns/" , "w",
        // //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
        // // formatter.format(wNS);
        // but AttributeEvent is too late in the process to set the mapping.
        // so you can comment that out.
        // But you still have to add w: and the other namespaces in
        // SmartXMLFormatter constructor. So may as well do 2.:
        // 2: stick all known namespaces on our root element above
        // 3: fix SmartXMLFormatter
        // Go with option 2 .. since this is clear
        out.append("<" + nodename
                + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\""  // w: namespace
                + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""
                + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""
                + " xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
                + " xmlns:v=\"urn:schemas-microsoft-com:vml\""
                + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""
                + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""
                + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""
                + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\""
                + " xmlns:dfx=\"" + Constants.BASE_NS_URI + "\""  // Add these, since SmartXMLFormatter only writes them on the first fragment
                + " xmlns:del=\"" + Constants.DELETE_NS_URI + "\""
                + " xmlns:ins=\"" + Constants.BASE_NS_URI + "\""
                        + " >" );
    }

0 个答案:

没有答案