使用Docx4j从XML转换为Docx

时间:2020-10-07 04:05:52

标签: java xml docx4j

我正在使用Java Docx4j库将.docx文件转换为它的.xml表示形式,将XML存储在数据库中,然后再将XML转换回.docx文件。

到目前为止,我可以成功地将.docx文件转换为XML并将其存储在数据库中。但是我在将XML转换回.docx格式时遇到问题。无论如何,我都不在编辑XML。如果我在Word中打开XML文件,它将显示正常。

String inputFilePath = args[0];
WordprocessingMLPackage wmlPackage = Docx4J.load(new File(inputFilePath));

ByteArrayOutputStream baos = new ByteArrayOutputStream();
Docx4J.save(wmlPackage, baos, Docx4J.FLAG_SAVE_FLAT_XML);

DatabaseController databaseController = new DatabaseController();
databaseController.commitXMLToDatabase(baos, "file-sample_1MB"); // Add the XML and filename to DB

String xml = databaseController.retrieveDocument("file-sample_1MB");

// Issue with the code below:
WordprocessingMLPackage testPkg = WordprocessingMLPackage.createPackage();
testPkg.getMainDocumentPart().unmarshal(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
testPkg.save(new File("src/main/resources/test1.docx")); 

我收到以下错误(我已删除了列出的一些方案网址)

Exception in thread "main" javax.xml.bind.JAXBException
 - with linked exception:
[javax.xml.bind.UnmarshalException
 - with linked exception:
[com.sun.istack.SAXParseException2; lineNumber: 1; columnNumber: 133; unexpected element (uri:"http://schemas.microsoft.com/office/2006/xmlPackage", local:"package"). Expected elements are <{urn:schemas-microsoft-com:office:excel}ClientData>,<{http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing}wsDr>,<{}xml>,<{http://opendope.org/xpaths}xpath>,<{http://opendope.org/conditions}xpathref>,<{http://opendope.org/xpaths}xpaths>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearLong>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearShort>]]
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:586)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:346)
    at DocxToXML.main(DocxToXML.java:37)
Caused by: javax.xml.bind.UnmarshalException

任何帮助将不胜感激。我可以发布.docx和.xml文件,如果它们有帮助的话。

1 个答案:

答案 0 :(得分:0)

立即修复。我现在使用的代码如下:

// retrieveDocument() gets the data from DB Blob as a byte[] Array 
// and returns an InputStream
InputStream xml = databaseController.retrieveDocument("Test1"); 
WordprocessingMLPackage pkg = Docx4J.load(xml);
Docx4J.save(pkg, new File("src/main/resources/output/test1.docx"));