我需要检查2个Word docx文件之间的区别。我使用docx4j。 起初我不得不改变SmartXMLFormatter:
public SmartXMLFormatter(Writer w) throws IOException {
this.xml = new XMLWriterNSImpl(w, false);
if (this.writeXMLDeclaration) {
this.xml.xmlDecl();
this.writeXMLDeclaration = false;
}
this.xml.setPrefixMapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");
this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");
this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");
this.xml.setPrefixMapping("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "r");
this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", "wp");
this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");
this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");
this.xml.setPrefixMapping(Constants.BASE_NS_URI, "dfx");
this.xml.setPrefixMapping(Constants.DELETE_NS_URI, "del");
this.xml.setPrefixMapping(Constants.INSERT_NS_URI, "ins");
}
在我更改了我的代码而没有俄文字母后,一切正常。 但是当我用俄语字符区分2个docx文档时,会出现以下异常:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Exception in thread "main" javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
... 7 more
所以,任何人都可以帮助我吗?
这是主要代码:
public class CompareDocumentsUsingDriver {
public static JAXBContext context = org.docx4j.jaxb.Context.jc;
/**
* @param args
*/
public static void main(String[] args) throws Exception {
System.setProperty("file.encoding", "UTF-8");
String newerfilepath = "B.docx";
String olderfilepath = "A.docx";
// 1. Load the Packages
WordprocessingMLPackage newerPackage = WordprocessingMLPackage
.load(new java.io.File(newerfilepath));
WordprocessingMLPackage olderPackage = WordprocessingMLPackage
.load(new java.io.File(olderfilepath));
Body newerBody = ((Document) newerPackage.getMainDocumentPart()
.getJaxbElement()).getBody();
Body olderBody = ((Document) olderPackage.getMainDocumentPart()
.getJaxbElement()).getBody();
System.out.println("Differencing..");
// 2. Do the differencing
StringWriter sw = new StringWriter();
Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
.getDocumentElement(),
XmlUtils.marshaltoW3CDomDocument(olderBody)
.getDocumentElement(), sw);
// The signature which takes Reader objects appears to be broken
// 3. Get the result
String contentStr = sw.toString();
System.out.println("Result: \n\n " + contentStr);
Body newBody = (Body) XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));
// In the general case, you need to handle relationships. Not done here!
// RelationshipsPart rp =
// newerPackage.getMainDocumentPart().getRelationshipsPart();
// handleRels(pd, rp);
newerPackage.setFontMapper(new IdentityPlusMapper());
newerPackage.save(new java.io.File("COMPARED.docx"));
}
/**
* In the general case, you need to handle relationships. Although not
* necessary in this simple example, we do it anyway for the purposes of
* illustration.
*/
private static void handleRels(Differencer pd, RelationshipsPart rp) {
// Since we are going to add rels appropriate to the docs being
// compared, for neatness and to avoid duplication
// (duplication of internal part names is fatal in Word,
// and export xslt makes images internal, though it does avoid
// duplicating
// a part ),
// remove any existing rels which point to images
List<Relationship> relsToRemove = new ArrayList<Relationship>();
for (Relationship r : rp.getRelationships().getRelationship()) {
// Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
if (r.getType().equals(Namespaces.IMAGE)) {
relsToRemove.add(r);
}
ti }
for (Relationship r : relsToRemove) {
rp.removeRelationship(r);
}
// Now add the rels we composed
List<Relationship> newRels = pd.getComposedRels();
for (Relationship nr : newRels) {
rp.addRelationship(nr);
}
}
}
致以最诚挚的问候,
添
编辑:
public static void openResult(String nodename, Writer out) throws IOException {
// In general, we need to avoid writing directly to Writer out...
// since it can happen before formatter output gets there
// namespaces not properly declared:
// 4 options:
// 1:
// OpenElementEvent containerOpen = new OpenElementEventNSImpl(xml1.getNamespaceURI(), rootNodeName);
// formatter.format(containerOpen);
// // AttributeEvent wNS = new AttributeEventNSImpl("http://www.w3.org/2000/xmlns/" , "w",
// // "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
// // formatter.format(wNS);
// but AttributeEvent is too late in the process to set the mapping.
// so you can comment that out.
// But you still have to add w: and the other namespaces in
// SmartXMLFormatter constructor. So may as well do 2.:
// 2: stick all known namespaces on our root element above
// 3: fix SmartXMLFormatter
// Go with option 2 .. since this is clear
out.append("<" + nodename
+ " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" // w: namespace
+ " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""
+ " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""
+ " xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
+ " xmlns:v=\"urn:schemas-microsoft-com:vml\""
+ " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""
+ " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""
+ " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""
+ " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\""
+ " xmlns:dfx=\"" + Constants.BASE_NS_URI + "\"" // Add these, since SmartXMLFormatter only writes them on the first fragment
+ " xmlns:del=\"" + Constants.DELETE_NS_URI + "\""
+ " xmlns:ins=\"" + Constants.BASE_NS_URI + "\""
+ " >" );
}