我不确定我的脚本的哪一部分实际上是错误的,但我在解析带有unicode字符的推文文本时遇到了一些困难:
示例推文:
Landsliðsmaður með viti. #rafhlaða #hræddur http://t.co/ci03F3vUNM
当我使用twitteroauth获取它并将其保存到.txt文件时,此字符串会在文件中转换为此内容:
Landsli\u00f0sma\u00f0ur me\u00f0 viti. #rafhla\u00f0a #hr\u00e6ddur http:\/\/t.co\/ci03F3vUNM
我使用简单的preg_replace
用超链接替换文本
function twitterify($ret) {
$ret = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t< ]*)#", "\\1<a href=\"\\2\" target=\"_blank\">\\2</a>", $ret);
$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r< ]*)#", "\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>", $ret);
$ret = preg_replace("/@(\w+)/", "<a href=\"http://www.twitter.com/\\1\" target=\"_blank\">@\\1</a>", $ret);
$ret = preg_replace("/#(\w+)/", "<a href=\"http://search.twitter.com/search?q=\\1\" target=\"_blank\">#\\1</a>", $ret);
return $ret;
}
但是只要它击中一个unicode字符就会失败:
#rafhlaða
成为<a href="#">#rafhla</a>ða
#hræddur
成为<a href="#">#hr</a>æddur
和类似的。
我在哪里做错了?使用PHP保存/打开我的文本文件或解析unicode编码的字符串?
答案 0 :(得分:1)
看这里,我把u修饰符放在所有正则表达式的末尾,并且它有效。将文件另存为utf8。如果您有json编码的字符串,则可以使用此解决方案对其进行解码:Php/json: decode utf8?
<?php
function ewchar_to_utf8($matches) {
$ewchar = $matches[1];
$binwchar = hexdec($ewchar);
$wchar = chr(($binwchar >> 8) & 0xFF) . chr(($binwchar) & 0xFF);
return iconv("unicodebig", "utf-8", $wchar);
}
function special_unicode_to_utf8($str) {
return preg_replace_callback("/\\\u([[:xdigit:]]{4})/i", "ewchar_to_utf8", $str);
}
$text = 'Landsli\u00f0sma\u00f0ur me\u00f0 viti. #rafhla\u00f0a #hr\u00e6ddur http:\/\/t.co\/ci03F3vUNM';
$text = special_unicode_to_utf8($text);
function twitterify($ret) {
$ret = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t< ]*)#u", "\\1<a href=\"\\2\" target=\"_blank\">\\2</a>", $ret);
$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r< ]*)#u", "\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>", $ret);
$ret = preg_replace("/@(\w+)/u", "<a href=\"http://www.twitter.com/\\1\" target=\"_blank\">@\\1</a>", $ret);
$ret = preg_replace("/#(\w+)/u", "<a href=\"http://search.twitter.com/search?q=\\1\" target=\"_blank\">#\\1</a>", $ret);
return $ret;
}
$text = twitterify($text);
print $text;
打印:
Landsliðsmaður með viti. <a href="http://search.twitter.com/search?q=rafhlaða" target="_blank">#rafhlaða</a> <a href="http://search.twitter.com/search?q=hræddur" target="_blank">#hræddur</a> http:\/\/t.co\/ci03F3vUNM