我想将自定义元数据写入pdf文件,这些文件不受XMP标准模式的支持,因此我编写了自己的模式,其中包含我自己的属性。我可以使用PDFBox或iTextPDF库将这些额外的自定义元数据成功写入我的PDF文件。但是,如果不解析XMP xml,我无法在客户端读取自定义元数据。
我想应该有一些我不知道将自定义架构恢复到java类的API。
如果我正在考虑正确的方向,或者我真的需要解析xml以便在客户端获取我的自定义数据,请帮助我吗?
以下是我使用PDFBox库编写的代码
自定义元数据文件。
package com.ecomail.emx.core.xmp;
import java.io.IOException;
import org.apache.jempbox.xmp.XMPMetadata;
public class EMXMetadata extends XMPMetadata {
public EMXMetadata() throws IOException {
super();
}
public EMXSchema addEMXSchema() {
EMXSchema schema = new EMXSchema(this);
return (EMXSchema) basicAddSchema(schema);
}
public EMXSchema getEMXSchema() throws IOException {
return (EMXSchema) getSchemaByClass(EMXSchema.class);
}
}
自定义架构文件。
package com.ecomail.emx.core.xmp;
import java.util.List;
import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchema;
import org.w3c.dom.Element;
public class EMXSchema extends XMPSchema {
public static final String NAMESPACE = "http://www.test.com/emx/elements/1.1/";
public EMXSchema(XMPMetadata parent) {
super(parent, "test", NAMESPACE);
}
public EMXSchema(Element element, String prefix) {
super(element, prefix);
}
public String getMetaDataType() {
return getTextProperty(prefix + ":metaDataType");
}
public void setMetaDataType(String metaDataType) {
setTextProperty(prefix + ":metaDataType", metaDataType);
}
public void removeRecipient(String recipient) {
removeBagValue(prefix + ":recipient", recipient);
}
public void addRecipient(String recipient) {
addBagValue(prefix + ":recipient", recipient);
}
public List<String> getRecipients() {
return getBagList(prefix + ":recipient");
}
}
XML客户端文件。
package com.ecomail.emx.core.xmp;
import java.util.GregorianCalendar;
import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchemaDublinCore;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
public class XMPClient {
private XMPClient() {
}
public static void main(String[] args) throws Exception {
PDDocument document = null;
try {
document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample.pdf");
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDDocumentInformation info = document.getDocumentInformation();
EMXMetadata metadata = new EMXMetadata();
XMPSchemaDublinCore dcSchema = metadata.addDublinCoreSchema();
dcSchema.setTitle(info.getTitle());
dcSchema.addContributor("Contributor");
dcSchema.setCoverage("coverage");
dcSchema.addCreator("PDFBox");
dcSchema.addDate(new GregorianCalendar());
dcSchema.setDescription("description");
dcSchema.addLanguage("language");
dcSchema.setCoverage("coverage");
dcSchema.setFormat("format");
EMXSchema emxSchema = metadata.addEMXSchema();
emxSchema.addRecipient("Recipient 1");
emxSchema.addRecipient("Recipient 2");
PDMetadata metadataStream = new PDMetadata(document);
metadataStream.importXMPMetadata(metadata);
catalog.setMetadata(metadataStream);
document.save("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");
document.close();
document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");
PDDocumentCatalog catalog2 = document.getDocumentCatalog();
PDMetadata metadataStream2 = catalog2.getMetadata();
XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
System.out.println("recipients : " + emxSchema2.getRecipients());
} finally {
if (document != null) {
document.close();
}
}
}
}
在XMPClient文件中,我希望通过从类名中查询,从重新存在的元数据中获取EMXSchema对象。
XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
System.out.println("recipients : " + emxSchema2.getRecipients());
但我得到Null Pointer Exception,表明找不到。 任何人都可以帮助我,如果我正确的方式或我需要解析XMP以获得我的收件人价值。
由于
答案 0 :(得分:2)
最后我自己开始工作了。 解决方案是使用接受预定义文档类的XMPMetadata类的另一个构造函数。
document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");
PDDocumentCatalog catalog2 = document.getDocumentCatalog();
PDMetadata metadataStream2 = catalog2.getMetadata();
System.out.println(metadataStream2.getInputStreamAsString());
InputStream xmpIn = metadataStream2.createInputStream();
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setExpandEntityReferences(true);
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
f.setValidating(false);
f.setCoalescing(true);
f.setNamespaceAware(true);
DocumentBuilder builder = f.newDocumentBuilder();
Document xmpDoc = builder.parse(xmpIn);
EMXMetadata emxMetadata = new EMXMetadata(xmpDoc);
EMXSchema emxSchema2 = emxMetadata.getEMXSchema();
System.out.println("recipients : " + emxSchema2.getRecipients());
现在我的自定义emxMetadata包含非null emxSchema2对象,我可以从中获取我的收件人对象。但是为了使其工作,我必须修改EMXMetadata以支持模式类的XMLNamespaceMapping
public class EMXMetadata extends XMPMetadata {
public EMXMetadata() throws IOException {
super();
addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}
public EMXMetadata(Document xmpDoc) {
super(xmpDoc);
addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}
public EMXSchema addEMXSchema() {
EMXSchema schema = new EMXSchema(this);
return (EMXSchema) basicAddSchema(schema);
}
public EMXSchema getEMXSchema() throws IOException {
return (EMXSchema) getSchemaByClass(EMXSchema.class);
}
}