我在Ubuntu中安装了PDFs Printer,所以当我打印任何文件时,它会生成.PDFs文件,现在我想找出元数据(即标题,创建日期,修改日期,制作人,author.etc ....)使用TIKA或普通java的PDF文件。我在Ubuntu中尝试使用jar但它没有给出创建日期和修改日期。是否有可能在Ubuntu中使用TIKA读取PDF的总元数据。所以有人知道如何获取请告诉我。
答案 0 :(得分:0)
在这里,您可以使用java program,并且需要download库文件并将其添加到类路径
从上面提到的链接中添加java程序
不幸的是对Tika一无所知,所以希望你可以像你提到的那样使用java方式。
import java.util.Iterator;
import java.util.Map;
import com.lowagie.text.pdf.PdfReader;
public class MainClass {
public static void main(String[] args) throws Exception {
PdfReader reader = new PdfReader("2.pdf"); //change your filename
Map info = reader.getInfo();
for (Iterator i = info.keySet().iterator(); i.hasNext();) {
String key = (String) i.next();
String value = (String) info.get(key);
System.out.println(key + ": " + value);
}
if (reader.getMetadata() == null) {
System.out.println("No XML Metadata.");
} else {
System.out.println("XML Metadata: " + new String(reader.getMetadata()));
}
}
}
输出显示如下:
ModDate: D:20120928204721+01'00'
Creator: Adobe Acrobat 10.0
CreationDate: D:20120916150806+01'00'
Producer: Adobe Acrobat 10.14 Paper Capture Plug-in with ClearScan
Title:
XML Metadata: <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:ModifyDate>2012-09-28T20:47:21+01:00</xmp:ModifyDate>
<xmp:CreateDate>2012-09-16T15:08:06+01:00</xmp:CreateDate>
<xmp:MetadataDate>2012-09-28T20:47:21+01:00</xmp:MetadataDate>
<xmp:CreatorTool>Adobe Acrobat 10.0</xmp:CreatorTool>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:91129bea-7273-4b3d-924f-5f47a5b55fbf</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:3a02e281-e35f-454a-bac1-adf1bb833636</xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Adobe Acrobat 10.14 Paper Capture Plug-in with ClearScan</pdf:Producer>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>