使用回复链从电子邮件正文中提取和发送地址 - JavaMail Api。

时间:2015-07-31 22:43:20

标签: email parsing javamail

我正在尝试从安然数据集中提取内容。我想我会尝试使用Javamail Api,因为它很容易解析。但是,我是JavaMail的新手,我在网上提到了一些资料。

我能够创建文件的MimeMessage对象并提取各种字段。 object.getContent()能够为我提供正文中的内容。

我想要做的是从身体中提取from和to地址。我不知道该怎么做。

我读到了有关创建Multipart对象并尝试从中提取的信息。

  1. 使用javax.mail.Message.getContent()获取邮件的内容。这个 应该在类型的对象中返回整个消息的内容 javax.mail.Multipart。

  2. 使用java.mail.Multipart上的方法检索特定部分 消息。这应该封装在类型的对象中 javax.mail.BodyPart。

  3. 使用javax.mail.BodyPart上的方法来检索。的内容 您感兴趣的信息的特定部分。

  4. 我的情况下指定的Mime类型不是Multipart。但是,当我尝试上面的方法时,我得到一个“线程中的异常”主“java.lang.ClassCastException:java.lang.String无法强制转换为javax.mail.Message”

    我该怎么办?

    以下是我要解析的文件的内容。

    Message-ID: <16159836.1075855377439.JavaMail.evans@thyme>
    Date: Fri, 7 Dec 2001 10:06:42 -0800 (PST)
    From: heather.dunton@enron.com
    To: k..allen@enron.com
    Subject: RE: West Position
    Mime-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    X-From: Dunton, Heather </O=ENRON/OU=NA/CN=RECIPIENTS/CN=HDUNTON>
    X-To: Allen, Phillip K. </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Pallen>
    X-cc: 
    X-bcc: 
    X-Folder: \Phillip_Allen_Jan2002_1\Allen, Phillip K.\Inbox
    X-Origin: Allen-P
    X-FileName: pallen (Non-Privileged).pst
    
    
    Please let me know if you still need Curve Shift.
    
    Thanks,
    Heather
     -----Original Message-----
    From:   Allen, Phillip K.  
    Sent:   Friday, December 07, 2001 5:14 AM
    To: Dunton, Heather
    Subject:    RE: West Position
    
    Heather,
    
    Did you attach the file to this email?
    
     -----Original Message-----
    From:   Dunton, Heather  
    Sent:   Wednesday, December 05, 2001 1:43 PM
    To: Allen, Phillip K.; Belden, Tim
    Subject:    FW: West Position
    
    Attached is the Delta position for 1/16, 1/30, 6/19, 7/13, 9/21
    
    
     -----Original Message-----
    From:   Allen, Phillip K.  
    Sent:   Wednesday, December 05, 2001 6:41 AM
    To: Dunton, Heather
    Subject:    RE: West Position
    
    Heather,
    
    This is exactly what we need.  Would it possible to add the prior day for each of the dates below to the pivot table.  In order to validate the curve shift on the dates below we also need the prior days ending positions.
    
    Thank you,
    
    Phillip Allen
    
     -----Original Message-----
    From:   Dunton, Heather  
    Sent:   Tuesday, December 04, 2001 3:12 PM
    To: Belden, Tim; Allen, Phillip K.
    Cc: Driscoll, Michael M.
    Subject:    West Position
    
    
    Attached is the Delta position for 1/18, 1/31, 6/20, 7/16, 9/24
    
    
    
     << File: west_delta_pos.xls >> 
    
    Let me know if you have any questions.
    
    
    Heather
    

    这是我使用的代码:

    private void mailParser() throws IOException, MessagingException {
        File mailFiles = new File("/xxx/xx/xx/x/x/inbox/1");
        String host = "host.com";
        Properties properties = System.getProperties();
    
        properties.setProperty("mail.smtp.host", host);
        Session session = Session.getDefaultInstance(properties);
    
        MimeMessage email = null;
        try {
            FileInputStream fis = new FileInputStream(mailFiles);
            email = new MimeMessage(session, fis);
    
            //Message ID
            System.out.println("message id: " + email.getMessageID());
    
            //Date
            System.out.println("sent date : " + email.getSentDate());
    
            //From
            Address[] add = email.getFrom();
            if (add != null) {
                for (int i = 0; i < add.length; i++) {
                    System.out.println("FROM  : " + add[i].toString());
                }
    
            //Subject
            System.out.println("\nsubject: " + email.getSubject());
    
            //TO
            if (email.getRecipients(Message.RecipientType.TO) != null) {
                for( Address emails: email.getRecipients(Message.RecipientType.TO)){
                System.out.println("\nrecipients to: " + Arrays.asList(email.getRecipients(Message.RecipientType.TO)));
            }
    
            //CC 
            if (email.getRecipients(Message.RecipientType.CC) != null) {
                  for( Address emails: email.getRecipients(Message.RecipientType.CC)){   
                System.out.println("\nrecipients cc: " + Arrays.asList(email.getRecipients(Message.RecipientType.CC)));
            }
    
            //BCC
            if (email.getRecipients(Message.RecipientType.BCC) != null) {
                  for( Address emails: email.getRecipients(Message.RecipientType.BCC)){
                System.out.println("\nrecipients bcc: " + Arrays.asList(email.getRecipients(Message.RecipientType.BCC)));
            }
    
            //Content type
            System.out.println("contetnt type: " + email.getContentType());
    
            //Content Encoding
            System.out.println("encoding: " + email.getEncoding());
    
            //Content of email
            Message message = (Message) email.getContent();
    
            if(message instanceof MimeMessage)
            {
            MimeMessage m = (MimeMessage)message;
            Object contentObject = m.getContent();
            if(contentObject instanceof Multipart)
            {
                BodyPart clearTextPart = null;
                Multipart content = (Multipart)contentObject;
                int count = content.getCount();
                for(int i=0; i<count; i++)
                {
                    BodyPart part =  content.getBodyPart(i);                 
                        clearTextPart = part;
                        break;
                }
    
                if(clearTextPart!=null)
                {
                   String result = (String) clearTextPart.getContent();
                    System.out.println(result);
                }
    
    
            }
    
            System.out.println("Content of email" + email.getContent().toString());
        } catch (MessagingException e) {
            throw new IllegalStateException("illegal state issue", e);
        } catch (FileNotFoundException e) {
            throw new IllegalStateException("file not found issue issue: " + mailFiles.getAbsolutePath(), e);
        }
    }
    

1 个答案:

答案 0 :(得分:1)

您所看到的是对回复邮件的回复的回复,其中原始邮件文本和一些标题信息作为新文本包含在回复邮件中。就MIME而言,原始消息的文本显示在回复消息中,就像您在自己中键入它一样,就像回复消息文本的任何其他部分一样。 “原始消息”分隔符不是MIME所知的。顶级消息只是纯文本消息,而不是多部分消息,并且没有MIME结构。

因为JavaMail正在解析消息的MIME结构,所以它不会专门处理消息内容。我担心你几乎可以自己解析消息的内容来提取包含/回复的消息文本。

您还会注意到邮件正文中的From和To地址只是名称,而不是电子邮件地址,而不是RFC 2822格式。日期也不是正确的格式。为方便起见,邮件阅读器(很可能是Outlook)只是以“人类可读的格式”在回复中包含原始邮件中的文本。