我正在尝试将其转换为PHP中可读的UTF8文本
Tel Aviv-Yafo (Hebrew: \u05ea\u05b5\u05bc\u05dc\u05be\u05d0\u05b8\u05d1\u05b4\u05d9\u05d1-\u05d9\u05b8\u05e4\u05d5\u05b9; Arabic: \u062a\u0644 \u0623\u0628\u064a\u0628\u200e, Tall \u02bcAb\u012bb), usually called Tel Aviv
有关如何操作的任何想法?
在线尝试了几种方法,但找不到一种。
在这种情况下,我有希伯来语和阿拉伯语的unicode
答案 0 :(得分:6)
其他答案都不是完美无缺的。 我把它们组合在一起,我的添加结果就是这个:
$replacedString = preg_replace("/\\\\u([0-9abcdef]{4})/", "&#x$1;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
这个确实有效:)
答案 1 :(得分:2)
它并不总是有效,因为/ uXXXX代码有时可以包含数字和字母。 尝试用\ w替换\ d(只是数字)(\ w匹配单词和数字)。
function unicode_conv($originalString) {
// The four \\\\ in the pattern here are necessary to match \u in the original string
$replacedString = preg_replace("/\\\\u(\w{4})/", "&#$1;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
return $unicodeString;
}
答案 2 :(得分:1)
有关从数字代码中获取unicode字符的方法,请参阅this comment。然后,您可以编写一个正则表达式替换,它将用等效字符替换每个\uXXXX
模式。
或者,您可以使用匹配的\uXXXX
html实体表单替换每个&#XXXX;
模式,然后使用以下内容:
mb_convert_encoding(string_with_html_entities, 'UTF-8', 'HTML-ENTITIES');
更完整的例子:
// The four \\\\ in the pattern here are necessary to match \u in the original string
$replacedString = preg_replace("/\\\\u(\d{4})/", "&#$1;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
答案 3 :(得分:1)
您应该在替换字符串中的'#'后添加'x'以指示使用十六进制数字。
$replacedString = preg_replace("/\\\\u(\d{4})/", "&#x$1;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
答案 4 :(得分:1)
我最近遇到了同样的问题,很高兴看到这个问题。 做一些测试,我发现以下代码有效:
$replacedString = preg_replace("/\\\\u([0-9abcdef]{4})/", "&#x$1;", $original_string);
//$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
我唯一改变的是我注释掉了第二行代码。 但是,网页必须设置为显示UTF-8。
享受!
答案 5 :(得分:0)
我正在尝试这段代码:
function unicode_conv($originalString) {
// The four \\\\ in the pattern here are necessary to match \u in the original string
$replacedString = preg_replace("/\\\\u(\d{4})/", "&#$1;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
return $unicodeString;
}
echo unicode_conv("Tel Aviv-Yafo (Hebrew: \u05ea\u05b5\u05bc\u05dc\u05be\u05d0\u05b8\u05d1\u05b4\u05d9\u05d1-\u05d9\u05b8\u05e4\u05d5\u05b9; Arabic: \u062a\u0644 \u0623\u0628\u064a\u0628\u200e, Tall \u02bcAb\u012bb), usually called Tel Aviv, is the second largest city in Israel, with an estimated population of 393,900. The city is situated on the Israeli Mediterranean coast, with a land area of 51.8\u00a0square kilometres (20.0\u00a0sq\u00a0mi). It is the largest and most populous city in the metropolitan area of Gush Dan, home to 3.15\u00a0million people as of 2008. The city is governed by the Tel Aviv-Yafo municipality, headed by Ron Huldai.\nTel Aviv was founded in 1909 on the outskirts of the ancient port city of Jaffa (Hebrew: \u05d9\u05b8\u05e4\u05d5\u05b9\u200e, Yafo; Arabic: \u064a\u0627\u0641\u0627\u200e, Yaffa). The growth of Tel Aviv soon outpaced Jaffa, which was largely Arab at the time. Tel Aviv and Jaffa were merged into a single municipality in 1950, two years after the establishment of the State of Israel. Tel Aviv's White City, designated a UNESCO World Heritage Site in 2003, comprises the world's largest concentration of Modernist-style buildings.\nTel Aviv is classified as a beta+...");
结果不正确,它没有太大的区别,一些字母改为希腊语/俄语而不是希伯来语/阿拉伯语。
就像实体编号不正确一样。