相关问题是Preventing DOMDocument::loadHTML() from converting entities,但它没有产生解决方案。
此代码:
$html = "<span>🆃🅴🆂🆃</span>";
$doc = new DOMDocument;
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadhtml($html);
foreach ($doc->getElementsByTagName('span') as $node)
{
var_dump($node->nodeValue);
var_dump(htmlentities($node->nodeValue));
var_dump(htmlentities(iconv('UTF-8', 'ISO-8859-1', $node->nodeValue)));
}
制作此HTML:
string(16) ""
string(16) ""
string(0) ""
但我想要的是🆃🅴🆂🆃
我正在运行PHP 5.6.29版,ini_get("default_charset")
返回UTF-8
答案 0 :(得分:0)
在http://php.net/manual/en/function.htmlentities.php上阅读更多内容后,我注意到它没有编码所有unicode。有人在评论中写了superentities
但这个功能对我来说似乎不起作用。 UTF8entities
功能确实如此。
以下是我从评论部分和代码修改的两个函数,而不是我想要的它给我的html编码值。
$html = "<span>🆃🅴🆂🆃</span>";
$doc = new DOMDocument;
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadhtml($html);
foreach ($doc->getElementsByTagName('span') as $node)
{
var_dump(UTF8entities($node->nodeValue));
}
function UTF8entities($content="") {
$characterArray = preg_split('/(?<!^)(?!$)/u', $content ); // return array of every multi-byte character
foreach ($characterArray as $character) {
$rv .= unicode_entity_replace($character);
}
return $rv;
}
function unicode_entity_replace($c) { //m. perez
$h = ord($c{0});
if ($h <= 0x7F) {
return $c;
} else if ($h < 0xC2) {
return $c;
}
if ($h <= 0xDF) {
$h = ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xEF) {
$h = ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xF4) {
$h = ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
}
}
返回:
string(36) "🆃🅴🆂🆃"