我试图从JAVA文件中获取摘要信息,但我找不到任何东西。我尝试了org.apache.poi.hpsf.*
。
我需要作者,主题,评论,关键词和标题。
File rep = new File("C:\\Cry_ReportERP006.rpt");
/* Read a test document <em>doc</em> into a POI filesystem. */
final POIFSFileSystem poifs = new POIFSFileSystem(new FileInputStream(rep));
final DirectoryEntry dir = poifs.getRoot();
DocumentEntry dsiEntry = null;
try
{
dsiEntry = (DocumentEntry) dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);
}
catch (FileNotFoundException ex)
{
/*
* A missing document summary information stream is not an error
* and therefore silently ignored here.
*/
}
/*
* If there is a document summry information stream, read it from
* the POI filesystem.
*/
if (dsiEntry != null)
{
final DocumentInputStream dis = new DocumentInputStream(dsiEntry);
final PropertySet ps = new PropertySet(dis);
final DocumentSummaryInformation dsi = new DocumentSummaryInformation(ps);
final SummaryInformation si = new SummaryInformation(ps);
/* Execute the get... methods. */
System.out.println(si.getAuthor());
答案 0 :(得分:2)
如http://poi.apache.org/overview.html中的POI概述所述,文件解析器的类型更多。 以下示例从2003 office文件中提取作者/创建者:
public static String parseOLE2FileAuthor(File file) {
String author=null;
try {
FileInputStream stream = new FileInputStream(file);
POIFSFileSystem poifs = new POIFSFileSystem(stream);
DirectoryEntry dir = poifs.getRoot();
DocumentEntry siEntry = (DocumentEntry)dir.getEntry(SummaryInformation.DEFAULT_STREAM_NAME);
DocumentInputStream dis = new DocumentInputStream(siEntry);
PropertySet ps = new PropertySet(dis);
SummaryInformation si = new SummaryInformation(ps);
author=si.getAuthor();
stream.close();
} catch (IOException ex) {
ex.getStackTrace();
} catch (NoPropertySetStreamException ex) {
ex.getStackTrace();
} catch (MarkUnsupportedException ex) {
ex.getStackTrace();
} catch (UnexpectedPropertySetTypeException ex) {
ex.getStackTrace();
}
return author;
}
对于docx,pptx,xlsx,POI有专门的类。 .docx文件的示例:
public static String parseDOCX(File file){
String author=null;
FileInputStream stream;
try {
stream = new FileInputStream(file);
XWPFDocument docx = new XWPFDocument(stream);
CoreProperties props = docx.getProperties().getCoreProperties();
author=props.getCreator();
stream.close();
} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}
return author;
}
用于PPTX使用XMLSlideShow或XMLWorkbook而不是XMLDocument。
答案 1 :(得分:1)
请在此处找到示例代码 - Appache POI how to
简而言之,您可以成为听众MyPOIFSReaderListener
:
SummaryInformation si = (SummaryInformation)
PropertySetFactory.create(event.getStream());
String title = si.getTitle();
String Author= si.getLastAuthor();
......
并将其注册为:
POIFSReader r = new POIFSReader();
r.registerListener(new MyPOIFSReaderListener(),
"\005SummaryInformation");
r.read(new FileInputStream(filename));
答案 2 :(得分:0)
对于2003 Office文件,可以使用从POIDocument继承的类。这是doc文件的示例:
FileInputStream in = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(in);
author = doc.getSummaryInformation().getAuthor();
和HSLFSlideShowImpl用于ppt,
适用于xls的HSSF工作簿,
HDGF图为vsd。
SummaryInformation类中还有许多其他文件信息。
对于2007年或更高版本的Office文件,请参见@Dragos Catalin Trieanu的答案