如何区分内嵌图像和签名以及电子邮件中的其他空白图像Imap

时间:2016-06-23 09:33:42

标签: c# email model-view-controller imap mailkit

我使用Mailkit从邮箱中获取电子邮件并将其保存到数据库以显示在我的MVC应用程序中。

我将html电子邮件保存为数据库中的纯文本,我可以获取附件并将​​其保存在文件系统中,但是当电子邮件中有内嵌图像时,我会因为签名和其他空白图像而被保存作为文件系统中的附件。

有没有办法区分内联附件和签名或其他空白图像?

提前致谢

2 个答案:

答案 0 :(得分:3)

使用哪个IMAP库并不重要,它们都没有一个功能可以帮助您做您想做的事情,因为它是您需要解决的非平凡问题用一些聪明才智来解决。

您可以做的是从FAQHtmlPreviewVisitor示例开始,然后每隔一段时间对其进行修改,以便将附件拆分为2个列表:

  1. 实际附件列表
  2. HTML引用的图像列表 (通过遍历HTML并跟踪引用的图像)
  3. 代码:

    /// <summary>
    /// Visits a MimeMessage and splits attachments into those that are
    /// referenced by the HTML body vs regular attachments.
    /// </summary>
    class AttachmentVisitor : MimeVisitor
    {
        List<MultipartRelated> stack = new List<MultipartRelated> ();
        List<MimeEntity> attachments = new List<MimeEntity> ();
        List<MimePart> embedded = new List<MimePart> ();
        bool foundBody;
    
        /// <summary>
        /// Creates a new AttachmentVisitor.
        /// </summary>
        public AttachmentVisitor ()
        {
        }
    
        /// <summary>
        /// The list of attachments that were in the MimeMessage.
        /// </summary>
        public IList<MimeEntity> Attachments {
            get { return attachments; }
        }
    
        /// <summary>
        /// The list of embedded images that were in the MimeMessage.
        /// </summary>
        public IList<MimePart> EmbeddedImages {
            get { return embedded; }
        }
    
        protected override void VisitMultipartAlternative (MultipartAlternative alternative)
        {
            // walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
            for (int i = alternative.Count - 1; i >= 0 && !foundBody; i--)
                alternative[i].Accept (this);
        }
    
        protected override void VisitMultipartRelated (MultipartRelated related)
        {
            var root = related.Root;
    
            // push this multipart/related onto our stack
            stack.Add (related);
    
            // visit the root document
            root.Accept (this);
    
            // pop this multipart/related off our stack
            stack.RemoveAt (stack.Count - 1);
        }
    
        // look up the image based on the img src url within our multipart/related stack
        bool TryGetImage (string url, out MimePart image)
        {
            UriKind kind;
            int index;
            Uri uri;
    
            if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
                kind = UriKind.Absolute;
            else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
                kind = UriKind.Relative;
            else
                kind = UriKind.RelativeOrAbsolute;
    
            try {
                uri = new Uri (url, kind);
            } catch {
                image = null;
                return false;
            }
    
            for (int i = stack.Count - 1; i >= 0; i--) {
                if ((index = stack[i].IndexOf (uri)) == -1)
                    continue;
    
                image = stack[i][index] as MimePart;
                return image != null;
            }
    
            image = null;
    
            return false;
        }
    
        // called when an HTML tag is encountered
        void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
        {
            if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
                // search for the src= attribute
                foreach (var attribute in ctx.Attributes) {
                    if (attribute.Id == HtmlAttributeId.Src) {
                        MimePart image;
    
                        if (!TryGetImage (attribute.Value, out image))
                            continue;
    
                        if (!embedded.Contains (image))
                            embedded.Add (image);
                    }
                }
            }
        }
    
        protected override void VisitTextPart (TextPart entity)
        {
            TextConverter converter;
    
            if (foundBody) {
                // since we've already found the body, treat this as an
                // attachment
                attachments.Add (entity);
                return;
            }
    
            if (entity.IsHtml) {
                converter = new HtmlToHtml {
                    HtmlTagCallback = HtmlTagCallback
                };
    
                converter.Convert (entity.Text);
            }
    
            foundBody = true;
        }
    
        protected override void VisitTnefPart (TnefPart entity)
        {
            // extract any attachments in the MS-TNEF part
            attachments.AddRange (entity.ExtractAttachments ());
        }
    
        protected override void VisitMessagePart (MessagePart entity)
        {
            // treat message/rfc822 parts as attachments
            attachments.Add (entity);
        }
    
        protected override void VisitMimePart (MimePart entity)
        {
            // realistically, if we've gotten this far, then we can treat
            // this as an attachment even if the IsAttachment property is
            // false.
            attachments.Add (entity);
        }
    }
    

    使用它:

    var visitor = new AttachmentVisitor ();
    
    message.Accept (visitor);
    
    // Now you can use visitor.Attachments and visitor.EmbeddedImages
    

    更简单,虽然不太容错(正如它实际上没有验证图像是否被HTML引用),这样做的方法是:

    var embeddedImages = message.BodyParts.OfType<MimePart> ().
        Where (x => x.ContentType.IsMimeType ("image", "*") &&
               x.ContentDisposition != null &&
               x.ContentDisposition.Disposition.Equals ("inline" StringComparison.OrdinalIgnoreCase));
    

    既然您已获得embeddedImages的列表,那么您必须找出一种方法来确定它们是仅用于签名还是用于HTML中的其他位置。

    您很可能也必须分析HTML本身。

    也许值得注意的是,某些HTML邮件会引用位于网络上的图像,这些图像嵌入在邮件的MIME中。如果您还想要这些图像,那么如果我提供的代码无法在MIME中找到它,则您需要修改TryGetImage以回退到从Web下载图像消息。

    对于text / plain消息(根本不能使用图像),将签名与消息体的其余部分分开的常用约定是只有2个破折号和空格的行:{{1} }。

    根据我对具有签名的HTML邮件的有限经验,它们似乎没有遵循类似的约定。查看我使用Outlook从Microsoft的同事那里收到的一些HTML消息,它们似乎在消息末尾的--内。但是,这假定该消息不是回复。一旦开始解析消息回复,这个<table>最终会在消息的中间位置,因为回复的原始消息是在最后。

    由于每个人的签名也不同,我不确定这种<table>相似性是否是Outlook约定,或者人们是否手动构建签名而且他们都只使用表格巧合(我也只见过少数,大多数都不使用签名,所以我的样本量很小)。

答案 1 :(得分:1)

使用https://mailsystem.codeplex.com/

上课的人阅读了这封电子邮件:

class readMail:IDisposable
    {
        public Imap4Client client = new Imap4Client();
        public readMail(string mailServer, int port, bool ssl, string login, string password)
        {
            Pop3Client pop = new Pop3Client();
            if (ssl)
            {
                client.ConnectSsl(mailServer, port);
            }
            else
            client.Connect(mailServer, port);
            client.Login(login, password);
        }
        public IEnumerable<Message> GetAllMails(string mailBox)
        {
            IEnumerable<Message> ms = GetMails(mailBox, "ALL").Cast<Message>();
            return GetMails(mailBox, "ALL").Cast<Message>();
        }

        protected Imap4Client Client
        {
            get { return client ?? (client = new Imap4Client()); }
        }
        private MessageCollection GetMails(string mailBox, string searchPhrase)
        {
            try
            {
                MessageCollection messages = new MessageCollection();
                Mailbox mails = new Mailbox();
                mails = Client.SelectMailbox(mailBox);
                messages = mails.SearchParse(searchPhrase);
                return messages;
            }
            catch(Exception ecc)
            {

            }

        }

        public void Dispose()
        {
            throw new NotImplementedException();
        }
    }

然后:

using (readMail read = new readMail("host.name.information", port, true, username, password) )
            {


                var emailList = read.GetAllMails(this.folderEmail);
                int k = 0;
                Mailbox bbb = read.client.SelectMailbox(this.folderEmail);
                int[] unseen = bbb.Search("UNSEEN");

                foreach (Message email in emailList)
                {

                  /// Contains all parts for which no Content-Disposition header was found. Disposition is left to the final agent.
                  MimePartCollection im1= email.UnknownDispositionMimeParts;
                  //Collection containing embedded MIME parts of the message (included text parts)
                  EmbeddedObjectCollection im2 = email.EmbeddedObjects;
                  //Collection containing attachments of the message.
                  AttachmentCollection attach=email.Attachments;
               }
            }

在我的情况下,所有签名的图片都在UnknownDispositionMimeParts中,但这可能是一个特定的案例(不同的电子邮件客户端等等)..所以我知道我没有找到任何库将嵌入图像与上下文图像分离为签名图像