根据此列表:http://mcdlr.com/8/,此特殊字符:▶具有HTML实体▶
。因此我认为PHP函数htmlentities()
会将▶的输入转换为▶
。但是,这是通过该函数运行带有该特殊字符的字符串并将其存储在MySQL数据库中时显示的内容:
A-¶
我已经在页面上设置了HTTP标头,我将字符串发送到<meta charset="utf-8">
,我甚至尝试在处理字符串的PHP文件中添加它:header('Content-Type: text/html; charset=utf-8');
,但它没有没有帮助。我做错了什么?
提前致谢。
答案 0 :(得分:0)
在处理UTF-8字符时,关键是每个编码都必须是UTF-8,否则它将被转换为ISO-8859-1。
请务必核对:
<meta charset>
,因为如果收到HTTP浏览器,浏览器将使用HTTP标头。
$dbc = new PDO('mysql:host=localhost;dbname=****;charset=utf8;', '******', '*****');
修改强>
我认为htmlentities manual page可能有点误导:
htmlentities - 将所有适用的字符转换为HTML实体
我认为应该说,&#34;将翻译表中提供的所有适用字符 转换为HTML实体&#34;。并非所有字符都必须在转换表中可用,并且不存在的任何字符都不会转换为其HTML实体。要查看翻译表中的字符,请参阅get_html_translation_table()。
例如,做:
print_r( get_html_translation_table(HTML_ENTITIES));
将输出:
Array
(
["] => "
[&] => &
[<] => <
[>] => >
[ ] =>
[¡] => ¡
[¢] => ¢
[£] => £
[¤] => ¤
[¥] => ¥
[¦] => ¦
[§] => §
[¨] => ¨
[©] => ©
[ª] => ª
[«] => «
[¬] => ¬
[] => ­
[®] => ®
[¯] => ¯
[°] => °
[±] => ±
[²] => ²
[³] => ³
[´] => ´
[µ] => µ
[¶] => ¶
[·] => ·
[¸] => ¸
[¹] => ¹
[º] => º
[»] => »
[¼] => ¼
[½] => ½
[¾] => ¾
[¿] => ¿
[À] => À
[Á] => Á
[Â] => Â
[Ã] => Ã
[Ä] => Ä
[Å] => Å
[Æ] => Æ
[Ç] => Ç
[È] => È
[É] => É
[Ê] => Ê
[Ë] => Ë
[Ì] => Ì
[Í] => Í
[Î] => Î
[Ï] => Ï
[Ð] => Ð
[Ñ] => Ñ
[Ò] => Ò
[Ó] => Ó
[Ô] => Ô
[Õ] => Õ
[Ö] => Ö
[×] => ×
[Ø] => Ø
[Ù] => Ù
[Ú] => Ú
[Û] => Û
[Ü] => Ü
[Ý] => Ý
[Þ] => Þ
[ß] => ß
[à] => à
[á] => á
[â] => â
[ã] => ã
[ä] => ä
[å] => å
[æ] => æ
[ç] => ç
[è] => è
[é] => é
[ê] => ê
[ë] => ë
[ì] => ì
[í] => í
[î] => î
[ï] => ï
[ð] => ð
[ñ] => ñ
[ò] => ò
[ó] => ó
[ô] => ô
[õ] => õ
[ö] => ö
[÷] => ÷
[ø] => ø
[ù] => ù
[ú] => ú
[û] => û
[ü] => ü
[ý] => ý
[þ] => þ
[ÿ] => ÿ
[Œ] => Œ
[œ] => œ
[Š] => Š
[š] => š
[Ÿ] => Ÿ
[ƒ] => ƒ
[ˆ] => ˆ
[˜] => ˜
[Α] => Α
[Β] => Β
[Γ] => Γ
[Δ] => Δ
[Ε] => Ε
[Ζ] => Ζ
[Η] => Η
[Θ] => Θ
[Ι] => Ι
[Κ] => Κ
[Λ] => Λ
[Μ] => Μ
[Ν] => Ν
[Ξ] => Ξ
[Ο] => Ο
[Π] => Π
[Ρ] => Ρ
[Σ] => Σ
[Τ] => Τ
[Υ] => Υ
[Φ] => Φ
[Χ] => Χ
[Ψ] => Ψ
[Ω] => Ω
[α] => α
[β] => β
[γ] => γ
[δ] => δ
[ε] => ε
[ζ] => ζ
[η] => η
[θ] => θ
[ι] => ι
[κ] => κ
[λ] => λ
[μ] => μ
[ν] => ν
[ξ] => ξ
[ο] => ο
[π] => π
[ρ] => ρ
[ς] => ς
[σ] => σ
[τ] => τ
[υ] => υ
[φ] => φ
[χ] => χ
[ψ] => ψ
[ω] => ω
[ϑ] => ϑ
[ϒ] => ϒ
[ϖ] => ϖ
[ ] =>  
[ ] =>  
[ ] =>  
[] => ‌
[] => ‍
[] => ‎
[] => ‏
[–] => –
[—] => —
[‘] => ‘
[’] => ’
[‚] => ‚
[“] => “
[”] => ”
[„] => „
[†] => †
[‡] => ‡
[•] => •
[…] => …
[‰] => ‰
[′] => ′
[″] => ″
[‹] => ‹
[›] => ›
[‾] => ‾
[⁄] => ⁄
[€] => €
[ℑ] => ℑ
[℘] => ℘
[ℜ] => ℜ
[™] => ™
[ℵ] => ℵ
[←] => ←
[↑] => ↑
[→] => →
[↓] => ↓
[↔] => ↔
[↵] => ↵
[⇐] => ⇐
[⇑] => ⇑
[⇒] => ⇒
[⇓] => ⇓
[⇔] => ⇔
[∀] => ∀
[∂] => ∂
[∃] => ∃
[∅] => ∅
[∇] => ∇
[∈] => ∈
[∉] => ∉
[∋] => ∋
[∏] => ∏
[∑] => ∑
[−] => −
[∗] => ∗
[√] => √
[∝] => ∝
[∞] => ∞
[∠] => ∠
[∧] => ∧
[∨] => ∨
[∩] => ∩
[∪] => ∪
[∫] => ∫
[∴] => ∴
[∼] => ∼
[≅] => ≅
[≈] => ≈
[≠] => ≠
[≡] => ≡
[≤] => ≤
[≥] => ≥
[⊂] => ⊂
[⊃] => ⊃
[⊄] => ⊄
[⊆] => ⊆
[⊇] => ⊇
[⊕] => ⊕
[⊗] => ⊗
[⊥] => ⊥
[⋅] => ⋅
[⌈] => ⌈
[⌉] => ⌉
[⌊] => ⌊
[⌋] => ⌋
[〈] => ⟨
[〉] => ⟩
[◊] => ◊
[♠] => ♠
[♣] => ♣
[♥] => ♥
[♦] => ♦
)
因此,上面未列出的任何字符都不会转换为其实体。请注意,如果您设置ENT_HTML5
标记,则转换表将大约10倍,但它仍然不包含(至少在我的服务器上)▶
的实体。它只有命名实体。
如果您需要将所有字符转换为各自的实体,您可以使用以下功能(免责声明,我没有写。以下是原始来源:http://php.net/htmlentities#107985):
// Unicode-proof htmlentities.
// Returns 'normal' chars as chars and weirdos as numeric html entites.
function superentities( $str ){
// get rid of existing entities else double-escape
$str = html_entity_decode(stripslashes($str),ENT_QUOTES,'UTF-8');
$ar = preg_split('/(?<!^)(?!$)/u', $str ); // return array of every multi-byte character
foreach ($ar as $c){
$o = ord($c);
if ( (strlen($c) > 1) || /* multi-byte [unicode] */
($o <32 || $o > 126) || /* <- control / latin weirdos -> */
($o >33 && $o < 40) ||/* quotes + ambersand */
($o >59 && $o < 63) /* html */
) {
// convert to numeric entity
$c = mb_encode_numericentity($c,array (0x0, 0xffff, 0, 0xffff), 'UTF-8');
}
$str2 .= $c;
}
return $str2;
}
因此,使用示例▶
,您可以执行以下操作:
var_dump(superentities('▶')); // outputs string(7) "▶"
但是,尽管如此,我建议您将所有内容存储在数据库中,而不用编码。通常,优选在输出到浏览器之前适当地编码。这样,如果您需要更改编码方式,则不必对其进行解码并以其他方式对其进行重新编码。为此,您必须确保所有编码都正确设置为UTF-8,如我原始答案中所述。