清理从IMAP收集的电子邮件以插入数据库

时间:2013-06-27 08:58:21

标签: php imap

我正在为内部客户支持系统创建脚本。我想从我们的IMAP收件箱(托管在Gmail上)收集电子邮件,并将电子邮件解析到数据库中。

清除帧,编码错误的标签和杂乱格式的最佳方法是什么,因此结果是格式最少的干净文本?

我知道正则表达式最有可能发挥重要作用,但我想知道这个功能是否存在于我缺少的某个地方。

编辑:更具体地说,删除了哪些内容:

所有内联CSS /样式,除了Bold,Underline和Italics等简单格式之外的所有HTML。

这是我用作测试用例的电子邮件,这是我从ZoneAlarm获得的相当强大的垃圾邮件,它有一些东西。

<td>
                    <br>
                    <br>


                    <table align="center" bgcolor="#749FD0" border="0" cellpadding="0" cellspacing="0" style="font-family:Arial,Helvetica,sans-serif;font-size:12px;line-height:16px;color:#555555" valign="top" width="700">
                        <tbody>
                            <tr>
                                <td>

                                    <table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
                                        <tbody>
                                            <tr>
                                                <td height="10">
                                                    <img border="0" height="1" src="http://download.zonealarm.com/bin/images/email/socialguard/spacer.gif" style="display: block; max-width: 2880px;" width="1"></td>
                                            </tr>
                                        </tbody>
                                    </table>
                                    <table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
                                        <tbody>
                                            <tr>
                                                <td height="10" width="10">
                                                    <img border="0" height="10" src="http://www.zonealarm.com/email/campaigns/2013/2013_06_SummerSale/nw.png" style="display: block; max-width: 2880px;" width="10"></td>
                                                <td bgcolor="#E3ECEC" height="10" width="660">

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

                                                    <a href="http://track.zonealarm.com:80/track?type=click&amp;enid=ZWFzPTEmbXNpZD0xJmF1aWQ9ODY4NjI4Jm1haWxpbmdpZD01NTE0MCZtZXNzYWdlaWQ9MzAwMDAmZGF0YWJhc2VpZD0xODQwMiZzZXJpYWw9MTY3OTIwMzgmZW1haWxpZD1nZWVrc2l4QGdtYWlsLmNvbSZ1c2VyaWQ9MV82MTE3JnRhcmdldGlkPSZmbD0mZXh0cmE9TXVsdGl2YXJpYXRlSWQ9JiYm&amp;&amp;&amp;2000&amp;&amp;&amp;http://www.zonealarm.com?cid=E200246" target="_blank"><img alt="ZoneAlarm by Check Point Software Technologies LTD." border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/za_transparent.png" width="120" style="display: block; max-width: 2880px;" title="ZoneAlarm by Check Point Software Technologies LTD."></a></td>
                                                <td align="right" style="font-family:Arial,Helvetica,sans-serif" width="150">
                                                    <span style="color:#999999;font-size:12px">Connect with ZoneAlarm</span></td>
                                                <td align="right" valign="middle" width="125">
                                                    <a href="http://track.zonealarm.com:80/track?type=click&amp;enid=ZWFzPTEmbXNpZD0xJmF1aWQ9ODY4NjI4Jm1haWxpbmdpZD01NTE0MCZtZXNzYWdlaWQ9MzAwMDAmZGF0YWJhc2VpZD0xODQwMiZzZXJpYWw9MTY3OTIwMzgmZW1haWxpZD1nZWVrc2l4QGdtYWlsLmNvbSZ1c2VyaWQ9MV82MTE3JnRhcmdldGlkPSZmbD0mZXh0cmE9TXVsdGl2YXJpYXRlSWQ9JiYm&amp;&amp;&amp;2001&amp;&amp;&amp;http://www.facebook.com/ZoneAlarmFirewall" target="_blank"><img alt="ZoneAlarm Facebook" border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/facebook.png" width="22" title="ZoneAlarm Facebook" style="max-width: 2880px;"></a> <a href="http://track.zonealarm.com:80/track?type=click&amp;enid=ZWFzPTEmbXNpZD0xJmF1aWQ9ODY4NjI4Jm1haWxpbmdpZD01NTE0MCZtZXNzYWdlaWQ9MzAwMDAmZGF0YWJhc2VpZD0xODQwMiZzZXJpYWw9MTY3OTIwMzgmZW1haWxpZD1nZWVrc2l4QGdtYWlsLmNvbSZ1c2VyaWQ9MV82MTE3JnRhcmdldGlkPSZmbD0mZXh0cmE9TXVsdGl2YXJpYXRlSWQ9JiYm&amp;&amp;&amp;2002&amp;&amp;&amp;http://twitter.com/zonealarm" target="_blank"><img alt="ZoneAlarm Twitter" border="0" width="22" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/twitter.png" title="ZoneAlarm Twitter" style="max-width: 2880px;"></a> <a href="http://track.zonealarm.com:80/track?type=click&amp;enid=ZWFzPTEmbXNpZD0xJmF1aWQ9ODY4NjI4Jm1haWxpbmdpZD01NTE0MCZtZXNzYWdlaWQ9MzAwMDAmZGF0YWJhc2VpZD0xODQwMiZzZXJpYWw9MTY3OTIwMzgmZW1haWxpZD1nZWVrc2l4QGdtYWlsLmNvbSZ1c2VyaWQ9MV82MTE3JnRhcmdldGlkPSZmbD0mZXh0cmE9TXVsdGl2YXJpYXRlSWQ9JiYm&amp;&amp;&amp;2003&amp;&amp;&amp;http://www.youtube.com/zonealarmsecurity" target="_blank"><img alt="ZoneAlarm YouTube" border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/youtube.png" title="ZoneAlarm YouTube" height="22" style="max-width: 2880px;"></a><img border="0" height="15" src="http://download.zonealarm.com/bin/images/email/socialguard/spacer.gif" width="10" style="max-width: 2880px;"></td>
                                                    <td bgcolor="#E3ECEC" rowspan="6" align="center" valign="top" width="1">
                                                <img align="right" height="32" src="http://download.zonealarm.com/bin/images/emails/welcome/borderx1.png" width="1" style="max-width: 2880px;">
                                                    </td>
                                            </tr>
                                        </tbody>
                                    </table>
                                    <table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
                                        <tbody>
                                            <tr>
                                                <td height="10" width="10">
                                                    <img border="0" height="10" src="http://www.zonealarm.com/email/campaigns/2013/2013_06_SummerSale/sw.png" style="display: block; max-width: 2880px;" width="10"></td>
                                                <td bgcolor="#E3ECEC" height="10" width="660">

                                                                                                                                                                                                                                                    

1 个答案:

答案 0 :(得分:1)

您可以使用HTML Purifier,请参阅:http://htmlpurifier.org/