何我使用Apache POI - HSMF提取嵌套到另一个中的Outlook消息?

时间:2012-11-25 05:00:39

标签: java outlook apache-poi

我正在使用Apache POI - HSMF从Outlooks msg文件中提取附件。除嵌套消息外,它工作正常。如果msg附加到另一个消息,我可以获取文件。如果邮件是嵌套的,我会收到信息,但我需要该文件。

MAPIMessage msg = new MAPIMessage(fileName)
for(AttachmentChunks attachment : msg.getAttachmentFiles()) {
    if(attachment.attachmentDirectory!=null){
        MAPIMessage nestedMsg attachment.attachmentDirectory.getAsEmbededMessage();
        // now save nestedMsg as a msg-file
    }
}

是否可以将嵌套的消息文件保存为常规的消息文件?

3 个答案:

答案 0 :(得分:4)

宣传对答案的评论。我可以告诉您如何将嵌入式Outlook消息提取到新文件,然后Apache POI将很乐意打开。我不太确定的是嵌入式邮件是否包含Outlook期望在独立邮件中找到的所有内容,因此我不能保证生成的文件将在Outlook中打开而不会出现问题......

首先,在Outlook中嵌入资源。根据它的类型,它可能存储在常规字节块中,存储在某种其他类型的特殊块(例如压缩RTF)中,或者它可能是文件中的自包含子目录。嵌入式消息以后一种方式存储。

如果要提取嵌入的消息,您要做的是使用POIFSFileSystem创建一个新的OLE2文件容器(所有Outlook消息都存储在OLE2容器中)。然后,您将要将源OLE2容器中嵌入的消息的目录的内容复制到新的容器的根目录中。最后,将POIFSFileSystem写入新文件,并完成提取!

您可能希望执行以下操作:

 MAPIMessage msg = new MAPIMessage(new NPOIFSFileSytem(new File("test.msg")));
 if (msg.attachmentChunks != null) {
    int number = 0;
    for (AttachmentChunk att : msg.attachmentChunks) {
        if (att.attachmentDirectory != null) {
           number++;
           POIFSFileSystem newMsg = new POIFSFileSystem();
           EntryUtils.copyNodes( att.attachmentDirectory, newMsg.getRoot() );
           FileOutputStream out = new FileOutputStream("embedded-" + number + ".msg");
           newMsg.write(out);
           out.close();
        }
    }
 }

如果Outlook有生气,请尝试在Outlook中打开源文件,将嵌入的邮件保存到新文件,然后使用org.apache.poi.poifs.dev.POIFSListerorg.apache.poi.poifs.dev.POIFSDump之类的内容来比较Outlook提取的和POI提取的文件,看看你是否能发现Outlook所做的任何改变....

答案 1 :(得分:0)

我已经为POI和单元测试添加了一些必要的功能,它提取了一个可以在Outlook中打开的嵌入式MSG。 可能没有完成对命名id属性的处理(这与验证我的增强功能的单元测试无关)。

https://svn.apache.org/viewvc/poi/trunk/src/scratchpad/testcases/org/apache/poi/hsmf/TestExtractEmbeddedMSG.java?view=markup

单元测试示例:

获取附加的MSG

MAPIMessage attachedMsg = attachments[0].getEmbeddedMessage();

重建附加MSG的方法

private POIFSFileSystem rebuildFromAttached(MAPIMessage attachedMsg) throws IOException {
    // Create new MSG and copy properties.
    POIFSFileSystem newDoc = new POIFSFileSystem();
    MessagePropertiesChunk topLevelChunk = new MessagePropertiesChunk(null);
    // Copy attachments and recipients.
    int recipientscount = 0;
    int attachmentscount = 0;
    for (Entry entry : attachedMsg.getDirectory()) {
        if (entry.getName().startsWith(RecipientChunks.PREFIX)) {
            recipientscount++;
            DirectoryEntry newDir = newDoc.createDirectory(entry.getName());
            for (Entry e : ((DirectoryEntry) entry)) {
                EntryUtils.copyNodeRecursively(e, newDir);
            }
        } else if (entry.getName().startsWith(AttachmentChunks.PREFIX)) {
            attachmentscount++;
            DirectoryEntry newDir = newDoc.createDirectory(entry.getName());
            for (Entry e : ((DirectoryEntry) entry)) {
                EntryUtils.copyNodeRecursively(e, newDir);
            }
        }
    }
    // Copy properties from properties stream.
    MessagePropertiesChunk mpc = attachedMsg.getMainChunks().getMessageProperties();
    for (Map.Entry<MAPIProperty, PropertyValue> p : mpc.getRawProperties().entrySet()) {
        PropertyValue val = p.getValue();
        if (!(val instanceof ChunkBasedPropertyValue)) {
            MAPIType type = val.getActualType();
            if (type != null && type != Types.UNKNOWN) {
                topLevelChunk.setProperty(val);
            }
        }
    }
    // Create nameid entries.
    DirectoryEntry nameid = newDoc.getRoot().createDirectory(NameIdChunks.NAME);
    // GUID stream
    nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00020102", new ByteArrayInputStream(new byte[0]));
    // Entry stream
    nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00030102", new ByteArrayInputStream(new byte[0]));
    // String stream
    nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00040102", new ByteArrayInputStream(new byte[0]));
    // Base properties.
    // Attachment/Recipient counter.
    topLevelChunk.setAttachmentCount(attachmentscount);
    topLevelChunk.setRecipientCount(recipientscount);
    topLevelChunk.setNextAttachmentId(attachmentscount);
    topLevelChunk.setNextRecipientId(recipientscount);
    // Unicode string format.
    byte[] storeSupportMaskData = new byte[4];
    PropertyValue.LongPropertyValue storeSupportPropertyValue = new PropertyValue.LongPropertyValue(MAPIProperty.STORE_SUPPORT_MASK,
            MessagePropertiesChunk.PROPERTIES_FLAG_READABLE | MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE,
            storeSupportMaskData);
    storeSupportPropertyValue.setValue(0x00040000);
    topLevelChunk.setProperty(storeSupportPropertyValue);
    topLevelChunk.setProperty(new PropertyValue(MAPIProperty.HASATTACH,
            MessagePropertiesChunk.PROPERTIES_FLAG_READABLE | MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE,
            attachmentscount == 0 ? new byte[] { 0 } : new byte[] { 1 }));
    // Copy properties from MSG file system.
    for (Chunk chunk : attachedMsg.getMainChunks().getChunks()) {
        if (!(chunk instanceof MessagePropertiesChunk)) {
            String entryName = chunk.getEntryName();
            String entryType = entryName.substring(entryName.length() - 4);
            int iType = Integer.parseInt(entryType, 16);
            MAPIType type = Types.getById(iType);
            if (type != null && type != Types.UNKNOWN) {
                MAPIProperty mprop = MAPIProperty.createCustom(chunk.getChunkId(), type, chunk.getEntryName());
                ByteArrayOutputStream data = new ByteArrayOutputStream();
                chunk.writeValue(data);
                PropertyValue pval = new PropertyValue(mprop, MessagePropertiesChunk.PROPERTIES_FLAG_READABLE
                        | MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE, data.toByteArray(), type);
                topLevelChunk.setProperty(pval);
            }
        }
    }
    topLevelChunk.writeProperties(newDoc.getRoot());
    return newDoc;
}

重建附加的MSG

            try (POIFSFileSystem extractedAttachedMsg = rebuildFromAttached(attachedMsg)) {
                try (ByteArrayOutputStream extractedAttachedMsgOut = new ByteArrayOutputStream()) {
                    extractedAttachedMsg.writeFilesystem(extractedAttachedMsgOut);
                    byte[] extratedAttachedMsgRaw = extractedAttachedMsgOut.toByteArray();
                    // this byte array can be persisted to disk and opened in MS Outlook
                }
            }

此致 多米尼克

答案 2 :(得分:0)

TestExtractEmbeddedMSG.java中的代码有些过时了。可以进行单元测试,但是对此进行进一步调查后,提取附加的MSG文件会更加简单。

基本上,提取嵌入式MSG文件需要特殊处理,因为嵌入式MSG不是简单的BLOB,而是整个结构化存储容器中的子目录。该子目录包含到目前为止构成嵌入式MSG的所有内容,但到目前为止我还知道两件事:

  • 该子目录不包含为整个结构化存储全局定义一次的任何“名称id”属性条目(映射由其属性集ID限定的非固定MAPI属性)
  • 属性流的二进制格式与顶级流的二进制格式略有不同,它丢失了8个字节的保留数据(我不知道Microsoft这么做的原因是什么,但这是根据文档得出的。)

为此,要将嵌入的MSG子目录转换为顶级子目录,需要将顶级的“名称ID”属性复制到新的MSG中(可以选择对此进行优化,以便仅真正引用的条目复制了嵌入式MSG的文件,但为此,所有“名称ID”条目都需要解析,并使用真正引用的条目重新构建),并且必须使用另外8个字节的保留数据来重新构建属性流。

这是我现在的操作方式。 rootmsg是顶级MSG的文件系统根,attachedmsg是嵌入式MSG的文件系统根,可以通过在getDirectory对象上调用MAPIMessage来获得。 可以通过在MAPIMessage对象上调用getEmbeddedMessage来检索嵌入的AttachmentChunks对象。

构建结构化的存储文件系统:

  private static POIFSFileSystem rebuildMessageToStream(DirectoryNode rootmsg, DirectoryNode attachedmsg) throws Exception
  {
    //
    // Create new MSG file system and copy all entries.
    //
    POIFSFileSystem newDoc = new POIFSFileSystem();
    //
    // Copy nameid entries from root message.
    //
    if (rootmsg != null) {
      for (Entry entry : rootmsg) {
        if (entry.getName().startsWith(NameIdChunks.NAME)) {
          EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
        }
      }
    }
    //
    // Copy entries from origin message.
    //
    for (Entry entry : attachedmsg) {
      if (entry.getName().startsWith(PropertiesChunk.NAME) && entry.isDocumentEntry()) {
        if (rootmsg != null) {
          //
          // Rebuild properties stream: Add additional 8 reserved bytes
          // to convert embedded message properties stream to root message properties stream.
          //
          // See MessagePropertiesChunk.writeHeaderData
          //
          DocumentEntry d = (DocumentEntry)entry;
          DocumentInputStream dstream = new DocumentInputStream(d);
          ByteArrayOutputStream rootps = new ByteArrayOutputStream(d.getSize() + 8);
          //
          // Copy first 8 bytes of reserved zeros plus 16 bytes for recipient/attachment counter.
          //
          byte[] data = new byte[24];
          dstream.readFully(data);
          rootps.write(data);
          //
          // Additional 8 bytes of reserved zeros.
          //
          rootps.write(new byte[8]);
          //
          // Properties (remaining data).
          //
          IOUtils.copy(dstream, rootps);
          //
          // Create properties stream entry.
          //
          newDoc.getRoot().createDocument(entry.getName(), new ByteArrayInputStream(rootps.toByteArray()));
          dstream.close();
        }
        else {
          //
          // Copy properties stream unmodified.
          //
          EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
        }
      }
      else {
        //
        // Copy other entry.
        //
        EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
      }
    }
    return newDoc;
  }

序列化结构化存储:

        try (POIFSFileSystem extractedAttachedMsg = rebuildMessageToStream(rootmsg, attachedmsg)) {
            try (ByteArrayOutputStream extractedAttachedMsgOut = new ByteArrayOutputStream()) {
                extractedAttachedMsg.writeFilesystem(extractedAttachedMsgOut);
                byte[] extratedAttachedMsgRaw = extractedAttachedMsgOut.toByteArray();
                // this byte array can be persisted to disk and opened in MS Outlook
            }
        }