PHP-如何处理“ utf-16”(美国ASCII编码的html字符串)以正确保存在DomDocument中?

时间:2018-11-29 06:36:20

标签: php html domdocument utf-16

我正在研究一个PHP项目,该项目获取电子邮件并将其显示在屏幕上。在电子邮件中,它获取以下html:

    <html>
    <head>

    <META http-equiv="Content-Type" content="text/html; charset=utf-16">

    <style type="text/css">
          TD {
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          }
          BODY {
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          }
        </style>



    </head>

      <body bgcolor="#eeeeee"><img width="1" height="1" alt="" src="https://trademe.tmcdn.co.nz/images/1pixel.gif?gen=20181128"><table cellspacing="0" cellpadding="0" width="700" bgcolor="white" align="center" style="border-left: 1px #CCCCCC solid; border-right: 1px #CCCCCC solid; border-top: 1px #CCCCCC solid;">
      <tr>

        <td height="20" colspan="4">&nbsp;</td>

      </tr>

      <tr>

        <td width="20"></td>

        <td><a href="https://www.trademe.co.nz/Track.aspx?site=2018112820201&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;"><img border="0" alt="Trade Me Logo" width="246" height="48" src="https://trademe.tmcdn.co.nz/images/new-brand-2016/common/tm-logo-2016-246x48-v1.gif?gen=2018112820201"></a><img src="https://api.trademe.co.nz/tracking/collect?evt=open&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937&amp;tid=EB71C99D-BEB4-445F-B62B-C172AC5A4CF4"></td>

        <td align="center"></td>

        <td width="20"></td>

      </tr>

      <tr>

        <td width="20"></td>

        <td colspan="2">

          <hr size="0" color="#CCCCCC">

          <center><small>Security Note: Trade Me will never ask you for your password via email</small></center>

          <hr size="0" color="#CCCCCC">

        </td>

        <td width="20"></td>

      </tr>

      <tr>

        <td width="20"></td>

        <td colspan="2" style="padding-left: 10px; padding-top: 10px;"><small>

      This is an automated email regarding listing #: 1847238571</small><br><br>

    Hi Matthew,

    <br><br><div>

      A member has asked a question on your listing for "2.4KW 2400W 3KVA 24VDC Pure Sine Wave Power Inverter Solar Caravan Off Grid LCD".

    </div><br><table width="100%" cellpadding="3" cellspacing="0" border="0">

            <tr>

              <td align="center" width="20"><img width="20" height="20" alt="" src="https://trademe.tmcdn.co.nz/images/icon_question.gif">&nbsp;</td>

              <td>what is the warranty like? &nbsp;&nbsp;<small><i>posted by:&nbsp;</i></small>&nbsp;<b><a href="https://www.trademe.co.nz/Members/Listings.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">matihegarty</a></b>

    (<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">5</a>&nbsp;<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937"><img align="absmiddle" border="0" src="https://www.trademe.co.nz/images/star.gif"></a>)

  &nbsp;&nbsp;&nbsp;<small>8:54 pm, Wed 28 Nov</small></td>

            </tr>

          </table><br><br><center><b><font size="3"><a href="https://www.trademe.co.nz/a.asp?id=1847238571&amp;qna=true#qna&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">Answer this question</a></font></b></center><br><br><div>

      We recommend you answer all questions on your listings to help buyers make informed decisions. Questions on vehicle listings created in Trade Me Motors will be displayed automatically. For other listings, questions will only be displayed if answered.

    </div><br><br>

    Happy trading!

    <br><br>

    The Trade Me team

    <br><a href="https://www.trademe.co.nz/?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">www.trademe.co.nz</a><br><br><small>

      If you don't wish to receive these emails or prefer plain text email, please update your

      <a href="https://www.trademe.co.nz/MyTradeMe/EmailOptions.aspx?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">email options</a></small></td>

        <td width="20"></td>

      </tr>

      <tr>

        <td colspan="3">

          <table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:White;">

            <tr>

              <td align="center"><br><small><img width="7" height="8" src="https://trademe.tmcdn.co.nz/images/3/common/triangle.gif">&nbsp;<font color="#666666">advertisement</font></small><br><br></td>

            </tr>

          </table>

          <table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:#9A9A9A;">

            <tr>

              <td><a href="https://www.trademe.co.nz/Link.aspx?i=101247"><img style="border-width:0;" src="https://trademe.tmcdn.co.nz/photoserver/adserver/TMI0003-700x70-mates-FA.png?e=" alt="" width="700" height="70"></a></td>

            </tr>

          </table>

        </td>

      </tr>

    </table>

  </body>

</html>

我的程序执行此操作:

    $cleanMessage = new DOMDocument();
    @$cleanMessage->loadHTML($this->bodyHTML); //To clean the html code for unclosed td table tags and other 

    $this->message = $cleanMessage->saveHTML();

但是我的输出是:

  DOCTYPE html PUBLIC“-// W3C // DTD HTML 4.0 Transitional // EN”   “ http://www.w3.org/TR/REC-html40/loose.dtd”> <.html> <。head> <。meta   http-equiv =“ Content-Type” content =“ text / html; charset = utf-16”> <。style   type =“ text / css”> TD {字体家族:Verdana,Tahoma,Arial,“ Sans Serif”;   字体大小:10pt; } BODY {字体家族:Verdana,Tahoma,Arial,“ Sans   衬线“; font-size:10pt;} <./ style> <./ head> <。body   bgcolor =“#eeeeee”> <�imgwidth =“ 1” height =“ 1” alt =“”   src =“ https://trademe.tmcdn.co.nz/images/1pixel.gif?gen=20181128”> <�   cellspacing =“ 0” cellpadding =“ 0” width =“ 700” bgcolor =“ white”   align =“ center” style =“ border-left:1px #CCCCCC solid; border-right:   1px #CCCCCC实体; border-top:1px #CCCCCC实心;“> <�tr> <�td   height =“ 20” colspan =“ 4”>�<�/ td> <�/ tr> <�tr> <�tdwidth =“ 20”> <�/ td>   <�td> <�a   href =“ https://www.trademe.co.nz/Track.aspx?site=2018112820201&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937”   style =“ text-decoration:underline;”> <�imgborder =“ 0” alt =“与我交易   徽标“ width =” 246“ height =” 48“   src =“ https://trademe.tmcdn.co.nz/images/new-brand-2016/common/tm-logo-2016-246x48-v1.gif?gen=2018112820201”> <�/ a> <�img   src =“ https://api.trademe.co.nz/tracking/collect?evt=open&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937&tid=EB71C99D-BEB4-445F-B62B-C172AC5A4CF4”> <。 / td>   <�tdalign =“ center”> <�/ td> <�tdwidth =“ 20”> <�/ td> <�/ tr> <�tr> <�/ td> <�tdcolspan =“ 2”> <�hrsize =“ 0”   color =“#CCCCCC”> <�center> <�small>安全性注意:Trade Me将永远不会   通过电子邮件要求您输入密码<./ small> <./ center> <�/ td> <�tdwidth =“ 20”> <�/ td> <�/ tr> <�tr> <�/ td> <�tdcolspan =“ 2” style =“ padding-left:10px;   padding-top:10px;“> <。small>这是一封有关以下内容的自动电子邮件   上市编号:1847238571
/ small>

嗨,马修,

  一位成员在您的清单上问了一个问题:“ 2.4KW 2400W 3KVA   24VDC纯正弦波功率逆变器太阳能大篷车离网LCD”。   <... / div>
<桌子宽度=“ 100%” cellpadding =“ 3” cellspacing =“ 0”   border =“ 0”> <�tr> <�tdalign =“ center” width =“ 20”> <�imgwidth =“ 20”   高度=“ 20” alt =“”   src =“ https://trademe.tmcdn.co.nz/images/icon_question.gif”>。<./ td>   保修是什么样的? ^ <�small> <�i>发布   创建人:�<�/ i> <�/ small>�<�b> <�a   href =“ https://www.trademe.co.nz/Members/Listings.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937”   style =“ text-decoration:underline;”> matihegarty <�/ a> <�/ b>(<�a   href =“ https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937”   style =“ text-decoration:underline;”> 5 <./ a>。<。a   href =“ https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937”> <�img   align =“ absmiddle” border =“ 0”   src =“ https://www.trademe.co.nz/images/star.gif”> <�/ a>)���<�small> 8:54   11月28日,星期三下午   <./ tr> <./ table>

回答此问题   问题<�/ a> <�/ font> <�/ b> <�/ center> <�br> <�br> <�div>   回答关于您房源的所有问题,以帮助买家了解情况   决定。有关在Trade Me Motors中创建的车辆清单的问题   将自动显示。对于其他列表,问题将   仅在回答时显示。 <�/ div> <�br> <�br>交易愉快!   

Trade Me团队
www.trademe.co.nz

如果您不希望   要接收这些电子邮件或希望使用纯文本电子邮件,请更新您的   <�a   href =“ https://www.trademe.co.nz/MyTradeMe/EmailOptions.aspx?tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937”   style =“ text-decoration:underline;”>电子邮件选项<./ a> <./ small> <./ td>   <�tdwidth =“ 20”> <�/ td> <�/ tr> <�tr> <�tdcolspan =“ 3”> <表格   cellspacing =“ 0” cellpadding =“ 0” border =“ 0” width =“ 100%” align =“ center”   style =“ background-color:White;”> <�tr> <�td   align =“ center”> <�br> <�small> <�imgwidth =“ 7” height =“ 8”   src =“ https://trademe.tmcdn.co.nz/images/3/common/triangle.gif”>。广告<�/ font> <�/ small> <�br> <�br> <�/ td>   <�/ tr> <�/ table> <�tablecellspacing =“ 0” cellpadding =“ 0” border =“ 0”   width =“ 100%” align =“ center”   style =“ background-color:#9A9A9A;”> <�tr> <�td> <�a   href =“ https://www.trademe.co.nz/Link.aspx?i=101247”> <�img   style =“ border-width:0;”   src =“ https://trademe.tmcdn.co.nz/photoserver/adserver/TMI0003-700x70-mates-FA.png?e=”   alt =“” width =“ 700” height =“ 70”> <�/ a> <�/ td> <�/ tr> <�/ table> <�/ td>   <�/ tr> <�/ table> <�/ body> <�/ html>

我尝试过:

1。

$this->bodyHTML = mb_convert_encoding($this->bodyHTML,'UTF-8','utf-16');
$this->bodyHTML = mb_convert_encoding($this->bodyHTML,'HTML-ENTITIES','UTF-8'); //both lines together
  1. $this->bodyHTML = mb_convert_encoding($this->bodyHTML,'HTML-ENTITIES','UTF-16');

但是它仍然显示乱码或中文字符。

正确显示此html的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

在您的html中,如果看到奇怪的字符,请用utf-16utf-8替换字符集ISO-8859-1

$this->bodyHTML = str_replace("charset=utf-16","charset=utf-8", $this->bodyHTML);