Question

我使用curl来获取页面的源代码，它运行得很好但不是内部的html，它返回的内容如下：

html看起来像这样：t i＆amp; ＃7 8 7 1; t，＃＆amp;＆amp;＃％$ ^％等等......

是的，我真的想将它转换为普通文本，我尝试使用php解码功能，但根本没有运气

谢谢

################## 3

编辑：谢谢先生，我试过了 $ fixed_result = html_entity_decode（$ result，ENT_COMPAT，“UTF-8”）; 它就像一个魅力，但有一些角色变成了“ ”，因为： “S CKH CH”

我不知道这是什么谢谢先生

Answer 1

这似乎是HTML实体编码，您应该能够使用html_entity_decode并使用指定的相应字符集恢复为普通字符。 e.g：

$fixed_result = html_entity_decode($result, ENT_COMPAT, "UTF-8");

Answer 2

根据ajreal的评论，请查看html_entity_decode的手册并尝试后备解决方案：

// For users prior to PHP 4.3.0 you may do this:
function unhtmlentities($string)
{
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

$c = unhtmlentities($a);

（edit）对于多字节支持，请尝试以下版本，取自PHP注释页面

function unhtmlentities($string)
{
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'code2utf(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'code2utf("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

// Returns the utf string corresponding to the unicode value (from php.net, courtesy - romans@void.lv)
function code2utf($num)
{
    if ($num < 128) return chr($num);
    if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
    if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
    return '';
}

$c = unhtmlentities($a);

如何将这些东西转换回原始文本？

2 个答案: