如何提取outlook生成的'.msg'文件的内容?

时间:2015-06-26 17:02:07

标签: apache-tika

当附件在Microsoft Outlook中保存的消息时,它会保存为“.msg”文件,其中包含电子邮件的所有内容以及附件文件。我想提取电子邮件正文的文本内容及其附件。 Apache Tika是否支持'.msg'文件?如果没有其他想法?

2 个答案:

答案 0 :(得分:1)

如果查看list of mail formats supported by Apache Tika 1.9(当前是最新版本),您会看到Outlook MSG文件被列为受支持。

Apache POI project's test files获取一个简单的MSG文件示例,并使用Tika App独立jar来简化测试,我们可以轻松获取内容和元数据:

$ java -jar tika-app-1.9.jar --metadata simple_test_msg.msg 
Author: Travis Ferguson
Content-Length: 16896
Content-Type: application/vnd.ms-outlook
Creation-Date: 2007-07-06T05:27:17Z
Last-Modified: 2007-07-06T05:27:17Z
Last-Save-Date: 2007-07-06T05:27:17Z
Message-Bcc: 
Message-Cc: 
Message-From: Travis Ferguson
Message-Recipient-Address: travis@overwrittenstack.com
Message-To: travis@overwrittenstack.com
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.microsoft.OfficeParser
creator: Travis Ferguson
date: 2007-07-06T05:27:17Z
dc:creator: Travis Ferguson
dc:description: test message
dc:title: test message
dcterms:created: 2007-07-06T05:27:17Z
dcterms:modified: 2007-07-06T05:27:17Z
meta:author: Travis Ferguson
meta:creation-date: 2007-07-06T05:27:17Z
meta:save-date: 2007-07-06T05:27:17Z
modified: 2007-07-06T05:27:17Z
resourceName: simple_test_msg.msg
subject: test message
title: test message


$ java -jar tika-app-1.9.jar --text simple_test_msg.msg  
test message
From
Travis Ferguson
To
travis@overwrittenstack.com
Recipients
travis@overwrittenstack.com

This is a test message.

元数据,包括发件人,收件人,日期等,文字,所有你想要的!

或者,如果您有特殊需求/要求并希望完全控制,则可以使用基础Apache POI HSMF library来解析MSG文件,查看HSMF unit tests的使用示例

答案 1 :(得分:-1)

Tika支持msg files

您可以使用apache POI周围有一些示例one

样品:

  public static void main(String[] args) throws Exception{
    MsgParser msgp = new MsgParser();
    Message msg = msgp.parseMsg("c:/temp/test2.msg");

    String fromEmail = msg.getFromEmail();
    String fromName = msg.getFromName();
    String subject = msg.getSubject();
    String body = msg.getBodyText();

    System.out.println("From :" + fromName + " <" + fromEmail + ">");
    System.out.println("Subject :" + subject);
    System.out.println("");
    System.out.println(body);
    System.out.println("");

    List atts = msg.getAttachments();
    for (Attachment att : atts) {
      if (att instanceof FileAttachment) {
        FileAttachment file = (FileAttachment) att;
        System.out.println("Attachment : " + file.getFilename());
        // you get the actual attachment with
        // byte date[] = file.getData();
      }
    }
  }