htmlentities()不将右三角形转换为HTML实体

时间:2014-04-26 23:47:40

标签: php utf-8 special-characters html-entities

根据此列表:http://mcdlr.com/8/,此特殊字符:▶具有HTML实体▶。因此我认为PHP函数htmlentities()会将▶的输入转换为▶。但是,这是通过该函数运行带有该特殊字符的字符串并将其存储在MySQL数据库中时显示的内容:

  

A-¶

我已经在页面上设置了HTTP标头,我将字符串发送到<meta charset="utf-8">,我甚至尝试在处理字符串的PHP文件中添加它:header('Content-Type: text/html; charset=utf-8');,但它没有没有帮助。我做错了什么?

提前致谢。

1 个答案:

答案 0 :(得分:0)

在处理UTF-8字符时,关键是每个编码都必须是UTF-8,否则它将被转换为ISO-8859-1。

请务必核对:

  • 数据库中表列的排序规则
  • 如果值被硬编码到PHP文件中,请确保文件以UTF-8格式保存
  • 如果数据来自浏览器,请确保PHP Content-Type标头用于UTF-8编码。通常,您可以在HTML中省略<meta charset>,因为如果收到HTTP浏览器,浏览器将使用HTTP标头。
  • 与数据库的连接必须指定编码,如下所示:

$dbc = new PDO('mysql:host=localhost;dbname=****;charset=utf8;', '******', '*****');

修改

我认为htmlentities manual page可能有点误导:

  

htmlentities - 将所有适用的字符转换为HTML实体

我认为应该说,&#34;将翻译表中提供的所有适用字符 转换为HTML实体&#34;。并非所有字符都必须在转换表中可用,并且不存在的任何字符都不会转换为其HTML实体。要查看翻译表中的字符,请参阅get_html_translation_table()

例如,做:

print_r( get_html_translation_table(HTML_ENTITIES));

将输出:

Array
(
    ["] => &quot;
    [&] => &amp;
    [<] => &lt;
    [>] => &gt;
    [ ] => &nbsp;
    [¡] => &iexcl;
    [¢] => &cent;
    [£] => &pound;
    [¤] => &curren;
    [¥] => &yen;
    [¦] => &brvbar;
    [§] => &sect;
    [¨] => &uml;
    [©] => &copy;
    [ª] => &ordf;
    [«] => &laquo;
    [¬] => &not;
    [­] => &shy;
    [®] => &reg;
    [¯] => &macr;
    [°] => &deg;
    [±] => &plusmn;
    [²] => &sup2;
    [³] => &sup3;
    [´] => &acute;
    [µ] => &micro;
    [¶] => &para;
    [·] => &middot;
    [¸] => &cedil;
    [¹] => &sup1;
    [º] => &ordm;
    [»] => &raquo;
    [¼] => &frac14;
    [½] => &frac12;
    [¾] => &frac34;
    [¿] => &iquest;
    [À] => &Agrave;
    [Á] => &Aacute;
    [Â] => &Acirc;
    [Ã] => &Atilde;
    [Ä] => &Auml;
    [Å] => &Aring;
    [Æ] => &AElig;
    [Ç] => &Ccedil;
    [È] => &Egrave;
    [É] => &Eacute;
    [Ê] => &Ecirc;
    [Ë] => &Euml;
    [Ì] => &Igrave;
    [Í] => &Iacute;
    [Î] => &Icirc;
    [Ï] => &Iuml;
    [Ð] => &ETH;
    [Ñ] => &Ntilde;
    [Ò] => &Ograve;
    [Ó] => &Oacute;
    [Ô] => &Ocirc;
    [Õ] => &Otilde;
    [Ö] => &Ouml;
    [×] => &times;
    [Ø] => &Oslash;
    [Ù] => &Ugrave;
    [Ú] => &Uacute;
    [Û] => &Ucirc;
    [Ü] => &Uuml;
    [Ý] => &Yacute;
    [Þ] => &THORN;
    [ß] => &szlig;
    [à] => &agrave;
    [á] => &aacute;
    [â] => &acirc;
    [ã] => &atilde;
    [ä] => &auml;
    [å] => &aring;
    [æ] => &aelig;
    [ç] => &ccedil;
    [è] => &egrave;
    [é] => &eacute;
    [ê] => &ecirc;
    [ë] => &euml;
    [ì] => &igrave;
    [í] => &iacute;
    [î] => &icirc;
    [ï] => &iuml;
    [ð] => &eth;
    [ñ] => &ntilde;
    [ò] => &ograve;
    [ó] => &oacute;
    [ô] => &ocirc;
    [õ] => &otilde;
    [ö] => &ouml;
    [÷] => &divide;
    [ø] => &oslash;
    [ù] => &ugrave;
    [ú] => &uacute;
    [û] => &ucirc;
    [ü] => &uuml;
    [ý] => &yacute;
    [þ] => &thorn;
    [ÿ] => &yuml;
    [Œ] => &OElig;
    [œ] => &oelig;
    [Š] => &Scaron;
    [š] => &scaron;
    [Ÿ] => &Yuml;
    [ƒ] => &fnof;
    [ˆ] => &circ;
    [˜] => &tilde;
    [Α] => &Alpha;
    [Β] => &Beta;
    [Γ] => &Gamma;
    [Δ] => &Delta;
    [Ε] => &Epsilon;
    [Ζ] => &Zeta;
    [Η] => &Eta;
    [Θ] => &Theta;
    [Ι] => &Iota;
    [Κ] => &Kappa;
    [Λ] => &Lambda;
    [Μ] => &Mu;
    [Ν] => &Nu;
    [Ξ] => &Xi;
    [Ο] => &Omicron;
    [Π] => &Pi;
    [Ρ] => &Rho;
    [Σ] => &Sigma;
    [Τ] => &Tau;
    [Υ] => &Upsilon;
    [Φ] => &Phi;
    [Χ] => &Chi;
    [Ψ] => &Psi;
    [Ω] => &Omega;
    [α] => &alpha;
    [β] => &beta;
    [γ] => &gamma;
    [δ] => &delta;
    [ε] => &epsilon;
    [ζ] => &zeta;
    [η] => &eta;
    [θ] => &theta;
    [ι] => &iota;
    [κ] => &kappa;
    [λ] => &lambda;
    [μ] => &mu;
    [ν] => &nu;
    [ξ] => &xi;
    [ο] => &omicron;
    [π] => &pi;
    [ρ] => &rho;
    [ς] => &sigmaf;
    [σ] => &sigma;
    [τ] => &tau;
    [υ] => &upsilon;
    [φ] => &phi;
    [χ] => &chi;
    [ψ] => &psi;
    [ω] => &omega;
    [ϑ] => &thetasym;
    [ϒ] => &upsih;
    [ϖ] => &piv;
    [ ] => &ensp;
    [ ] => &emsp;
    [ ] => &thinsp;
    [‌] => &zwnj;
    [‍] => &zwj;
    [‎] => &lrm;
    [‏] => &rlm;
    [–] => &ndash;
    [—] => &mdash;
    [‘] => &lsquo;
    [’] => &rsquo;
    [‚] => &sbquo;
    [“] => &ldquo;
    [”] => &rdquo;
    [„] => &bdquo;
    [†] => &dagger;
    [‡] => &Dagger;
    [•] => &bull;
    […] => &hellip;
    [‰] => &permil;
    [′] => &prime;
    [″] => &Prime;
    [‹] => &lsaquo;
    [›] => &rsaquo;
    [‾] => &oline;
    [⁄] => &frasl;
    [€] => &euro;
    [ℑ] => &image;
    [℘] => &weierp;
    [ℜ] => &real;
    [™] => &trade;
    [ℵ] => &alefsym;
    [←] => &larr;
    [↑] => &uarr;
    [→] => &rarr;
    [↓] => &darr;
    [↔] => &harr;
    [↵] => &crarr;
    [⇐] => &lArr;
    [⇑] => &uArr;
    [⇒] => &rArr;
    [⇓] => &dArr;
    [⇔] => &hArr;
    [∀] => &forall;
    [∂] => &part;
    [∃] => &exist;
    [∅] => &empty;
    [∇] => &nabla;
    [∈] => &isin;
    [∉] => &notin;
    [∋] => &ni;
    [∏] => &prod;
    [∑] => &sum;
    [−] => &minus;
    [∗] => &lowast;
    [√] => &radic;
    [∝] => &prop;
    [∞] => &infin;
    [∠] => &ang;
    [∧] => &and;
    [∨] => &or;
    [∩] => &cap;
    [∪] => &cup;
    [∫] => &int;
    [∴] => &there4;
    [∼] => &sim;
    [≅] => &cong;
    [≈] => &asymp;
    [≠] => &ne;
    [≡] => &equiv;
    [≤] => &le;
    [≥] => &ge;
    [⊂] => &sub;
    [⊃] => &sup;
    [⊄] => &nsub;
    [⊆] => &sube;
    [⊇] => &supe;
    [⊕] => &oplus;
    [⊗] => &otimes;
    [⊥] => &perp;
    [⋅] => &sdot;
    [⌈] => &lceil;
    [⌉] => &rceil;
    [⌊] => &lfloor;
    [⌋] => &rfloor;
    [〈] => &lang;
    [〉] => &rang;
    [◊] => &loz;
    [♠] => &spades;
    [♣] => &clubs;
    [♥] => &hearts;
    [♦] => &diams;
)

因此,上面未列出的任何字符都不会转换为其实体。请注意,如果您设置ENT_HTML5标记,则转换表将大约10倍,但它仍然不包含(至少在我的服务器上)的实体。它只有命名实体。

如果您需要将所有字符转换为各自的实体,您可以使用以下功能(免责声明,我没有写。以下是原始来源:http://php.net/htmlentities#107985):

// Unicode-proof htmlentities.
// Returns 'normal' chars as chars and weirdos as numeric html entites.
function superentities( $str ){
    // get rid of existing entities else double-escape
    $str = html_entity_decode(stripslashes($str),ENT_QUOTES,'UTF-8');
    $ar = preg_split('/(?<!^)(?!$)/u', $str );  // return array of every multi-byte character
    foreach ($ar as $c){
        $o = ord($c);
        if ( (strlen($c) > 1) || /* multi-byte [unicode] */
            ($o <32 || $o > 126) || /* <- control / latin weirdos -> */
            ($o >33 && $o < 40) ||/* quotes + ambersand */
            ($o >59 && $o < 63) /* html */
        ) {
            // convert to numeric entity
            $c = mb_encode_numericentity($c,array (0x0, 0xffff, 0, 0xffff), 'UTF-8');
        }
        $str2 .= $c;
    }
    return $str2;
}

因此,使用示例,您可以执行以下操作:

var_dump(superentities('▶')); // outputs string(7) "&#9654;"

但是,尽管如此,我建议您将所有内容存储在数据库中,而不用编码。通常,优选在输出到浏览器之前适当地编码。这样,如果您需要更改编码方式,则不必对其进行解码并以其他方式对其进行重新编码。为此,您必须确保所有编码都正确设置为UTF-8,如我原始答案中所述。