我正在尝试将php字符串从utf-8解码为所需的编码(iso-8859-2)。问题是,utf-8字符串的字符不适合iso-8859-2,但是从windows-1251转换为utf-8(尽管它们看起来与ISO-8859的本机字符完全相同 - 2)。那些字符用“?”表示在输出上。
如果我尝试将相同的字符串转换为windows-1251,则会显示相同的字符,但是缺少的字符分别是iso-8859-2的原生字符(如“ä”,“ö”等)
我从mysql数据库中获取字符串并需要转换为非unicode字符集并将它们存储到sqlite数据库文件中,因为将要使用它们的程序不支持unicode。
所以,我的问题是有没有办法在utf-8中为字符获取可能的非unicode编码?我目前正在遍历整个utf字符串并尝试逐个解码每个字符,但是Windows-1251字符仍然缺失。
代码如下:
$string = "various charset input";
$str = str_split_unicode($string,1); // The function from the php-str_split manual page, splits utf string into an array
$handler = "";
foreach($str as $value):
$currentChar = iconv("utf-8", "iso-8859-2", $value) or "%no%";
if($currentChar == "%no%" ):
$currentChar = "";
$currentChar = iconv("utf-8", "windows-1251", $value) or "%no%";
endif;
if($currentChar != "%no%"):
$handler .= $currentChar;
else:
$handler .= $value;
endif;
endforeach;
$string = $handler;
但问号仍然存在。
更新
感谢CertaiN,我编辑了你提供的功能(虽然它可能变得不那么可读),所以它将字符转换回适当的编码。
功能
function utf8_to_multicharset($str, $encoding, $htmSupportedOutput="iso-8859-15") {
$utf8 = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
$out = $utf8;
mb_convert_variables($encoding, 'UTF-8', $out);
is_array($htmSupportedOutput) or $htmSupportedOutput = explode(",",$htmSupportedOutput);
$table = get_html_translation_table(HTML_SPECIALCHARS | ENT_QUOTES);
foreach ($out as $i => &$char) {
if ($char === '?' && $utf8[$i] !== '?') {
$char = mb_convert_encoding($utf8[$i], 'HTML-ENTITIES', 'UTF-8');
}
elseif (isset($table[$char])) {
$char = $table[$char];
}
foreach($htmSupportedOutput as $o):
$char = html_entity_decode($char,null,$o);
endforeach;
}
return implode('', $out);
}
现在它从指定的编码列表中进行检查,并将字符串转换为支持它的编码,如下所示:
Php用法:
<?php
$string = "vatiöus charset иnput";
$result = utf8_to_multicharset($string,"iso-8859-2","cp1252,cp1251,koi8r");
?>
答案 0 :(得分:0)
您需要 HTML实体编码吗?
function utf8_to_escaped_another($str, $encoding) {
$utf8 = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
$out = $utf8;
mb_convert_variables($encoding, 'UTF-8', $out);
$table = get_html_translation_table(HTML_SPECIALCHARS | ENT_QUOTES);
foreach ($out as $i => &$char) {
if ($char === '?' && $utf8[$i] !== '?') {
$char = mb_convert_encoding($utf8[$i], 'HTML-ENTITIES', 'UTF-8');
} elseif (isset($table[$char])) {
$char = $table[$char];
}
}
return implode('', $out);
}
<?php
function utf8_to_escaped_another($str, $encoding) {
$utf8 = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
$out = $utf8;
mb_convert_variables($encoding, 'UTF-8', $out);
$table = get_html_translation_table(HTML_SPECIALCHARS | ENT_QUOTES);
foreach ($out as $i => &$char) {
if ($char === '?' && $utf8[$i] !== '?') {
$char = mb_convert_encoding($utf8[$i], 'HTML-ENTITIES', 'UTF-8');
} elseif (isset($table[$char])) {
$char = $table[$char];
}
}
return implode('', $out);
}
header('Content-Type: text/html; charset=ISO-8859-2');
$text = <<<EOD
English: Good Morning
Arabic: صباح الخير
Japanese: おはよう
EOD;
echo '<pre>';
echo utf8_to_escaped_another($text, 'ISO-8859-2');
echo '</pre>';
English: Good Morning
Arabic: صباح الخير
Japanese: おはよう
<pre>English: Good Morning
Arabic: صباح الخير
Japanese: おはよう</pre>