我将enron电子邮件数据集设置为文件夹,其中包含文本文件形式的电子邮件,我想提取" body"这些电子邮件的一部分
问题是,发件人的电子邮件,收件人电子邮件等字段由To:,From:等指定。 但Body不会从任何标题开始,它只是在指定了所有其他字段后才开始。
现在,文本文件可以包含许多实体(如果是电子邮件线程/会话)。 我想从这些文件中提取正文。可以使用javamail api,如果是,那怎么办?它只是离线数据集,以我的硬盘驱动器中的文本文件的形式,而不是在互联网上。
文件是这样的 -
Message-ID: <16159836.1075855377439.JavaMail.evans@thyme>
Date: Fri, 7 Dec 2001 10:06:42 -0800 (PST)
From: heather.dunton@enron.com
To: k..allen@enron.com
Subject: RE: West Position
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Dunton, Heather </O=ENRON/OU=NA/CN=RECIPIENTS/CN=HDUNTON>
X-To: Allen, Phillip K. </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Pallen>
X-cc:
X-bcc:
X-Folder: \Phillip_Allen_Jan2002_1\Allen, Phillip K.\Inbox
X-Origin: Allen-P
X-FileName: pallen (Non-Privileged).pst
Please let me know if you still need Curve Shift.
Thanks,
Heather
-----Original Message-----
From: Allen, Phillip K.
Sent: Friday, December 07, 2001 5:14 AM
To: Dunton, Heather
Subject: RE: West Position
Heather,
Did you attach the file to this email?
-----Original Message-----
From: Dunton, Heather
Sent: Wednesday, December 05, 2001 1:43 PM
To: Allen, Phillip K.; Belden, Tim
Subject: FW: West Position
Attached is the Delta position for 1/16, 1/30, 6/19, 7/13, 9/21
-----Original Message-----
From: Allen, Phillip K.
Sent: Wednesday, December 05, 2001 6:41 AM
To: Dunton, Heather
Subject: RE: West Position
Heather,
This is exactly what we need. Would it possible to add the prior day for each of the dates below to the pivot table. In order to validate the curve shift on the dates below we also need the prior days ending positions.
Thank you,
Phillip Allen
-----Original Message-----
From: Dunton, Heather
Sent: Tuesday, December 04, 2001 3:12 PM
To: Belden, Tim; Allen, Phillip K.
Cc: Driscoll, Michael M.
Subject: West Position
Attached is the Delta position for 1/18, 1/31, 6/20, 7/16, 9/24
<< File: west_delta_pos.xls >>
Let me know if you have any questions.
Heather
&#13;
答案 0 :(得分:0)
请提供示例文件,如果可能的话,最复杂的文件。 工作是以编程方式打开每个文件,解析其内容,并提取电子邮件的正文。 那你想把它存放在哪里? 你在运行哪个操作系统?
答案 1 :(得分:0)
如果每个文件都是MIME格式的单个消息,则可以使用带有InputStream的JavaMail MimeMessage构造函数。然后,您可以使用JavaMail API来提取消息的内容。请参阅JavaMail FAQ,javadocs,网站,规范等。