使用nokogiri解析电子邮件

时间:2012-05-01 05:48:49

标签: ruby-on-rails parsing

我正在寻找如何使用nokogiri解析电子邮件的方向。以下是电子邮件示例。我查看了此文档http://nokogiri.org/tutorials/parsing_an_html_xml_document.html以及谷歌搜索时间。我是Ruby on Rails的新手,我正在寻找一个很好的例子或详细说明。谢谢你的时间。

MIME-Version: 1.0
Received: by 10.76.129.52; Mon, 30 Apr 2012 22:11:24 -0700 (PDT)
Date: Mon, 30 Apr 2012 22:11:24 -0700
Message-ID: <CAJq2oOCB-UzNEFGc+3TVBSEA0L9VPRrjevhdW_KK41C+AGDjJw@mail.gmail.com>
Subject: Customize Gmail with colors and themes
From: Gmail Team <mail-noreply@google.com>
To: parse email <parseemail2@gmail.com>
Content-Type: multipart/alternative; boundary=bcaec545501825242f04bef29a74

--bcaec545501825242f04bef29a74
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

To spice up your inbox with colors and themes, check out the Themes tab
under Settings.
       Customize Gmail =BB <https://mail.google.com/mail/#settings/themes>


Enjoy!

- The Gmail Team
[image: Themes thumbnails]

Please note that Themes are not available if you're using Internet Explorer
6.0. To take advantage of the latest Gmail features, please upgrade to a
fully supported
browser<http://support.google.com/mail/bin/answer.py?answer=3D6557&hl=3Den&=
utm_source=3Dwel-eml&utm_medium=3Deml&utm_campaign=3Den>
.

--bcaec545501825242f04bef29a74
Content-Type: text/html; charset=ISO-8859-1

<html>
<font face="Arial, Helvetica, sans-serif">
<p>To spice up your inbox with colors and themes, check out the Themes tab
under Settings.</p>

<table cellpadding="0" cellspacing="0">
  <col style="width: 1px;"/>
  <col/>
  <col style="width: 1px;"/>
  <tr>
    <td></td>
    <td height="1px" style="background-color: #ddd"></td>
    <td></td>
  </tr>
  <tr>
    <td style="background-color: #ddd"></td>
    <td background="https://mail.google.com/mail/images/welcome-button-background.png"
        style="background-color: #ddd; background-repeat: repeat-x;
            padding: 10px; font-size: larger">
          <a href="https://mail.google.com/mail/#settings/themes"
            style="font-weight: bold; color: #000; text-decoration: none;
            display: block;">
      Customize Gmail &#187;</a>
    </td>
    <td style="background-color: #ddd"></td>
  </tr>
 <tr>
    <td></td>
    <td height="1px" style="background-color: #ddd"></td>
    <td></td>
  </tr>
</table>

<p>Enjoy!</p>

<p>- The Gmail Team</p>

<img width="398" height="256" src="https://mail.google.com/mail/images/gmail_themes_2.png"
alt="Themes thumbnails" />

<p><font size="-2" color="#999">Please note that Themes are not available if
you're using Internet Explorer 6.0. To take advantage of the latest Gmail
features, please
<a href="http://support.google.com/mail/bin/answer.py?answer=6557&hl=en&utm_source=wel-    
eml&utm_medium=eml&utm_campaign=en"><font color="#999">
upgrade to a fully supported browser</font></a>.</font></p>

</font>
</html>

--bcaec545501825242f04bef29a74--

1 个答案:

答案 0 :(得分:1)

Nokogiri非常适合解析HTML,但你在这里有一封电子邮件。尝试使用TMail首先从电子邮件中获取HTML部分,然后您可以使用Nokogiri来解析它。从TMail文档推断,您可以执行以下操作:

  email = TMail::Mail.load('my_email.eml')
  html_doc = Nokogiri::HTML(email.body)