Question

我有一个RSS源，它是根据用户输入的数据生成的。有许多用户使用日语输入文本，大多数时候没有问题。但是，有一个特定的RSS提要显示错误：

error on line 25 at column 25: Input is not proper UTF-8, indicate encoding !
Bytes: 0x0B 0x32 0x38 0x20

请注意，在此特定RSS源中，这不是日语字符出现的第一个位置。

我已经看到其他答案Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string建议尝试更改编码或其他一些，但我很困惑为什么编码只是在这个特定的Feed上失败，而且，如果它是因为这个一个人输入日语以不同的方式编码，我如何能够检测到某人以不同的方式输入，并选择性地仅修复可能导致问题的那些。

编辑：根据这篇文章：http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/

我尝试添加以下内容：

if (!mb_check_encoding($content, "UTF-8")) {

           $content = mb_convert_encoding($content, "UTF-8",
              "Shift-JIS, EUC-JP, JIS, SJIS, JIS-ms, eucJP-win, SJIS-win, ISO-2022-JP,
               ISO-2022-JP-MS, SJIS-mac, SJIS-Mobile#DOCOMO, SJIS-Mobile#KDDI,
               SJIS-Mobile#SOFTBANK, UTF-8-Mobile#DOCOMO, UTF-8-Mobile#KDDI-A,
               UTF-8-Mobile#KDDI-B, UTF-8-Mobile#SOFTBANK, ISO-2022-JP-MOBILE#KDDI");
        }

但是，它仍然报告为没有在utf8中正确编码。

Edit2：所以，我非常困惑，因为我只记录了mb_detect_encoding认为文本是什么编码，并且它全部以ASCII（必须是其他字段，因为日语不能是ASCII，显然），和UTF-8。你知道它为什么会认为它是UTF-8，但仍然会得到这些编码错误吗？

Answer 1

确保您已将用户输入正确编码为UTF-8。

http://php.net/manual/de/function.utf8-encode.php

string utf8_encode ( string $data )

RSS feed编码

1 个答案: