Question

我正在使用一个网站，该网站需要针对未启用Unicode的旧日本手机。问题是，该站点的文本作为HTML实体保存在数据库中（即＆amp;＃1234;）。该数据库绝对无法更改，因为它用于数百个网站。

我需要做的是将这些实体转换为实际字符，然后在发送之前转换字符串编码，因为电话会渲染实体而不首先转换它们。

我已经尝试了mb_convert_encoding和iconv，但他们所做的只是转换实体的编码，而不是创建文本。

提前致谢

编辑：

我也试过html_entity_decode。它产生了相同的结果 - 一个未转换的字符串。

以下是我正在使用的示例数据。

期望的结果：シェラトン·ヌーサリゾート＆スパ

HTML代码：シェラトン・ヌーサリゾート＆スパ

html_entity_decode([the string above],ENT_COMPAT,'SHIFT_JIS');的输出与输入字符串相同。

Answer 1

请注意，您要从实体中创建正确的代码点。如果原始编码是UTF-8，例如：

$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');

Answer 2

我在php.net上找到了这个功能，它适用于我的例子：

function unhtmlentities($string) {
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}

Answer 3

我认为你只需要html_entity_decode。

修改：根据您的修改：

$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string);

请注意，这只是将实体转换为实际字符的第一步。

Answer 4

只是为了参与，因为我在编码时遇到了某种编码错误，我会建议这个片段：

 $string_to_encode=" your string ";
 if(mb_detect_encoding($string_to_encode)!==FALSE){
      $converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
 }

对于大量数据而言可能不是最好的，但仍然有效。

将UTF-8中的HTML实体转换为SHIFT_JIS

4 个答案: