我只需要修改Open Office文件元数据。如何在不将整个文件加载到内存(file.odt)的情况下执行此操作? 我只需要使用文件:meta.xml和label:... metadata ...
我正在使用Apache ODF Toolkit 0.5-incubating。我的代码加载了meta.xml文件但我无法获取元数据:
OdfPackage pkg = OdfPackage.loadPackage(new File("file.odt"));
Node d = pkg.getDom("meta.xml").getElementsByTagName("office:document-meta").item(0);
for(int i =0; i<d.getAttributes().getLength();i++) {
String nombre = d.getAttributes().item(i).getNodeName();
String valor = d.getAttributes().item(i).getNodeValue();
System.out.println("Clave: " + nombre + " valor: " + valor);
}
答案 0 :(得分:3)
如果您想使用各种文件格式,Apache Tika是您最好的选择。 Tika提供了一个通用界面,用于从大量格式中提取文本和元数据,并隐藏了不同类型和格式的复杂性。
在命令行中,从this sample file中提取元数据
java -jar tika-app-1.4.jar --metadata quick.odt
你会收到大量的元数据:
Author: Jesper Steen Møller
Character Count: 43
Content-Length: 7042
Content-Type: application/vnd.oasis.opendocument.text
Creation-Date: 2005-09-06T23:34:00
Edit-Time: PT2M0S
Image-Count: 0
Keywords: Pangram, fox, dog
Last-Modified: 2005-09-06T23:49:00
Last-Save-Date: 2005-09-06T23:49:00
Object-Count: 0
Page-Count: 1
Paragraph-Count: 1
Table-Count: 0
Word-Count: 9
cp:subject: Gym class featuring a brown fox and lazy dog
creator: Jesper Steen Møller
date: 2005-09-06T23:49:00
dc:creator: Jesper Steen Møller
dc:description: Gym class featuring a brown fox and lazy dog
dc:language: en-US
dc:subject: Pangram, fox, dog
dc:title: The quick brown fox jumps over the lazy dog
dcterms:created: 2005-09-06T23:34:00
dcterms:modified: 2005-09-06T23:49:00
description: Gym class featuring a brown fox and lazy dog
editing-cycles: 5
generator: OpenOffice.org/1.9.125$Win32 OpenOffice.org_project/680m125$Build-8947
initial-creator: Nevin Nollop
language: en-US
meta:author: Jesper Steen Møller
meta:character-count: 43
meta:creation-date: 2005-09-06T23:34:00
meta:image-count: 0
meta:initial-author: Nevin Nollop
meta:object-count: 0
meta:page-count: 1
meta:paragraph-count: 1
meta:save-date: 2005-09-06T23:49:00
meta:table-count: 0
meta:word-count: 9
modified: 2005-09-06T23:49:00
nbCharacter: 43
nbImg: 0
nbObject: 0
nbPage: 1
nbPara: 1
nbTab: 0
nbWord: 9
resourceName: quick.odt
subject: Gym class featuring a brown fox and lazy dog
title: The quick brown fox jumps over the lazy dog
xmpTPg:NPages: 1
从Java开始,您可以使用像
这样简单的东西TikaConfig tika = TikaConfig.getDefaultConfig();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
InputStream input = TikaInputStream.get(new File("test.ods"));
tika.getParser().parse(input, null, metadata, context);
您将获得Metadata object
上的元数据答案 1 :(得分:0)
您可以使用org.odftoolkit提供的OdfDocument软件包。您可以在此处获取依赖关系=> https://mvnrepository.com/artifact/org.odftoolkit/odfdom-java
您可以解析文档
OdfDocument odfDocument = OdfDocument.loadDocument(new URL(URLPath).openStream());
并获取类似的元数据
wordCount = odfDocument.getOfficeMetadata().getDocumentStatistic().getWordCount();