我正在使用Apache POI - HSMF从Outlooks msg文件中提取附件。除嵌套消息外,它工作正常。如果msg附加到另一个消息,我可以获取文件。如果邮件是嵌套的,我会收到信息,但我需要该文件。
MAPIMessage msg = new MAPIMessage(fileName)
for(AttachmentChunks attachment : msg.getAttachmentFiles()) {
if(attachment.attachmentDirectory!=null){
MAPIMessage nestedMsg attachment.attachmentDirectory.getAsEmbededMessage();
// now save nestedMsg as a msg-file
}
}
是否可以将嵌套的消息文件保存为常规的消息文件?
答案 0 :(得分:4)
宣传对答案的评论。我可以告诉您如何将嵌入式Outlook消息提取到新文件,然后Apache POI将很乐意打开。我不太确定的是嵌入式邮件是否包含Outlook期望在独立邮件中找到的所有内容,因此我不能保证生成的文件将在Outlook中打开而不会出现问题......
首先,在Outlook中嵌入资源。根据它的类型,它可能存储在常规字节块中,存储在某种其他类型的特殊块(例如压缩RTF)中,或者它可能是文件中的自包含子目录。嵌入式消息以后一种方式存储。
如果要提取嵌入的消息,您要做的是使用POIFSFileSystem
创建一个新的OLE2文件容器(所有Outlook消息都存储在OLE2容器中)。然后,您将要将源OLE2容器中嵌入的消息的目录的内容复制到新的容器的根目录中。最后,将POIFSFileSystem写入新文件,并完成提取!
您可能希望执行以下操作:
MAPIMessage msg = new MAPIMessage(new NPOIFSFileSytem(new File("test.msg")));
if (msg.attachmentChunks != null) {
int number = 0;
for (AttachmentChunk att : msg.attachmentChunks) {
if (att.attachmentDirectory != null) {
number++;
POIFSFileSystem newMsg = new POIFSFileSystem();
EntryUtils.copyNodes( att.attachmentDirectory, newMsg.getRoot() );
FileOutputStream out = new FileOutputStream("embedded-" + number + ".msg");
newMsg.write(out);
out.close();
}
}
}
如果Outlook有生气,请尝试在Outlook中打开源文件,将嵌入的邮件保存到新文件,然后使用org.apache.poi.poifs.dev.POIFSLister
和org.apache.poi.poifs.dev.POIFSDump
之类的内容来比较Outlook提取的和POI提取的文件,看看你是否能发现Outlook所做的任何改变....
答案 1 :(得分:0)
我已经为POI和单元测试添加了一些必要的功能,它提取了一个可以在Outlook中打开的嵌入式MSG。 可能没有完成对命名id属性的处理(这与验证我的增强功能的单元测试无关)。
单元测试示例:
获取附加的MSG
MAPIMessage attachedMsg = attachments[0].getEmbeddedMessage();
重建附加MSG的方法
private POIFSFileSystem rebuildFromAttached(MAPIMessage attachedMsg) throws IOException {
// Create new MSG and copy properties.
POIFSFileSystem newDoc = new POIFSFileSystem();
MessagePropertiesChunk topLevelChunk = new MessagePropertiesChunk(null);
// Copy attachments and recipients.
int recipientscount = 0;
int attachmentscount = 0;
for (Entry entry : attachedMsg.getDirectory()) {
if (entry.getName().startsWith(RecipientChunks.PREFIX)) {
recipientscount++;
DirectoryEntry newDir = newDoc.createDirectory(entry.getName());
for (Entry e : ((DirectoryEntry) entry)) {
EntryUtils.copyNodeRecursively(e, newDir);
}
} else if (entry.getName().startsWith(AttachmentChunks.PREFIX)) {
attachmentscount++;
DirectoryEntry newDir = newDoc.createDirectory(entry.getName());
for (Entry e : ((DirectoryEntry) entry)) {
EntryUtils.copyNodeRecursively(e, newDir);
}
}
}
// Copy properties from properties stream.
MessagePropertiesChunk mpc = attachedMsg.getMainChunks().getMessageProperties();
for (Map.Entry<MAPIProperty, PropertyValue> p : mpc.getRawProperties().entrySet()) {
PropertyValue val = p.getValue();
if (!(val instanceof ChunkBasedPropertyValue)) {
MAPIType type = val.getActualType();
if (type != null && type != Types.UNKNOWN) {
topLevelChunk.setProperty(val);
}
}
}
// Create nameid entries.
DirectoryEntry nameid = newDoc.getRoot().createDirectory(NameIdChunks.NAME);
// GUID stream
nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00020102", new ByteArrayInputStream(new byte[0]));
// Entry stream
nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00030102", new ByteArrayInputStream(new byte[0]));
// String stream
nameid.createDocument(PropertiesChunk.DEFAULT_NAME_PREFIX + "00040102", new ByteArrayInputStream(new byte[0]));
// Base properties.
// Attachment/Recipient counter.
topLevelChunk.setAttachmentCount(attachmentscount);
topLevelChunk.setRecipientCount(recipientscount);
topLevelChunk.setNextAttachmentId(attachmentscount);
topLevelChunk.setNextRecipientId(recipientscount);
// Unicode string format.
byte[] storeSupportMaskData = new byte[4];
PropertyValue.LongPropertyValue storeSupportPropertyValue = new PropertyValue.LongPropertyValue(MAPIProperty.STORE_SUPPORT_MASK,
MessagePropertiesChunk.PROPERTIES_FLAG_READABLE | MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE,
storeSupportMaskData);
storeSupportPropertyValue.setValue(0x00040000);
topLevelChunk.setProperty(storeSupportPropertyValue);
topLevelChunk.setProperty(new PropertyValue(MAPIProperty.HASATTACH,
MessagePropertiesChunk.PROPERTIES_FLAG_READABLE | MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE,
attachmentscount == 0 ? new byte[] { 0 } : new byte[] { 1 }));
// Copy properties from MSG file system.
for (Chunk chunk : attachedMsg.getMainChunks().getChunks()) {
if (!(chunk instanceof MessagePropertiesChunk)) {
String entryName = chunk.getEntryName();
String entryType = entryName.substring(entryName.length() - 4);
int iType = Integer.parseInt(entryType, 16);
MAPIType type = Types.getById(iType);
if (type != null && type != Types.UNKNOWN) {
MAPIProperty mprop = MAPIProperty.createCustom(chunk.getChunkId(), type, chunk.getEntryName());
ByteArrayOutputStream data = new ByteArrayOutputStream();
chunk.writeValue(data);
PropertyValue pval = new PropertyValue(mprop, MessagePropertiesChunk.PROPERTIES_FLAG_READABLE
| MessagePropertiesChunk.PROPERTIES_FLAG_WRITEABLE, data.toByteArray(), type);
topLevelChunk.setProperty(pval);
}
}
}
topLevelChunk.writeProperties(newDoc.getRoot());
return newDoc;
}
重建附加的MSG
try (POIFSFileSystem extractedAttachedMsg = rebuildFromAttached(attachedMsg)) {
try (ByteArrayOutputStream extractedAttachedMsgOut = new ByteArrayOutputStream()) {
extractedAttachedMsg.writeFilesystem(extractedAttachedMsgOut);
byte[] extratedAttachedMsgRaw = extractedAttachedMsgOut.toByteArray();
// this byte array can be persisted to disk and opened in MS Outlook
}
}
此致 多米尼克
答案 2 :(得分:0)
TestExtractEmbeddedMSG.java
中的代码有些过时了。可以进行单元测试,但是对此进行进一步调查后,提取附加的MSG文件会更加简单。
基本上,提取嵌入式MSG文件需要特殊处理,因为嵌入式MSG不是简单的BLOB,而是整个结构化存储容器中的子目录。该子目录包含到目前为止构成嵌入式MSG的所有内容,但到目前为止我还知道两件事:
为此,要将嵌入的MSG子目录转换为顶级子目录,需要将顶级的“名称ID”属性复制到新的MSG中(可以选择对此进行优化,以便仅真正引用的条目复制了嵌入式MSG的文件,但为此,所有“名称ID”条目都需要解析,并使用真正引用的条目重新构建),并且必须使用另外8个字节的保留数据来重新构建属性流。>
这是我现在的操作方式。
rootmsg
是顶级MSG的文件系统根,attachedmsg
是嵌入式MSG的文件系统根,可以通过在getDirectory
对象上调用MAPIMessage
来获得。
可以通过在MAPIMessage
对象上调用getEmbeddedMessage
来检索嵌入的AttachmentChunks
对象。
构建结构化的存储文件系统:
private static POIFSFileSystem rebuildMessageToStream(DirectoryNode rootmsg, DirectoryNode attachedmsg) throws Exception
{
//
// Create new MSG file system and copy all entries.
//
POIFSFileSystem newDoc = new POIFSFileSystem();
//
// Copy nameid entries from root message.
//
if (rootmsg != null) {
for (Entry entry : rootmsg) {
if (entry.getName().startsWith(NameIdChunks.NAME)) {
EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
}
}
}
//
// Copy entries from origin message.
//
for (Entry entry : attachedmsg) {
if (entry.getName().startsWith(PropertiesChunk.NAME) && entry.isDocumentEntry()) {
if (rootmsg != null) {
//
// Rebuild properties stream: Add additional 8 reserved bytes
// to convert embedded message properties stream to root message properties stream.
//
// See MessagePropertiesChunk.writeHeaderData
//
DocumentEntry d = (DocumentEntry)entry;
DocumentInputStream dstream = new DocumentInputStream(d);
ByteArrayOutputStream rootps = new ByteArrayOutputStream(d.getSize() + 8);
//
// Copy first 8 bytes of reserved zeros plus 16 bytes for recipient/attachment counter.
//
byte[] data = new byte[24];
dstream.readFully(data);
rootps.write(data);
//
// Additional 8 bytes of reserved zeros.
//
rootps.write(new byte[8]);
//
// Properties (remaining data).
//
IOUtils.copy(dstream, rootps);
//
// Create properties stream entry.
//
newDoc.getRoot().createDocument(entry.getName(), new ByteArrayInputStream(rootps.toByteArray()));
dstream.close();
}
else {
//
// Copy properties stream unmodified.
//
EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
}
}
else {
//
// Copy other entry.
//
EntryUtils.copyNodeRecursively(entry, newDoc.getRoot());
}
}
return newDoc;
}
序列化结构化存储:
try (POIFSFileSystem extractedAttachedMsg = rebuildMessageToStream(rootmsg, attachedmsg)) {
try (ByteArrayOutputStream extractedAttachedMsgOut = new ByteArrayOutputStream()) {
extractedAttachedMsg.writeFilesystem(extractedAttachedMsgOut);
byte[] extratedAttachedMsgRaw = extractedAttachedMsgOut.toByteArray();
// this byte array can be persisted to disk and opened in MS Outlook
}
}