我有一个包含多行中文单词的文件,用逗号,
分隔,如下所示:
你,我,他,好,但,中,国,龙
好,把,是,的,啊,人,吖,哦
我想通过使用以下代码将它们加载到数组中,稍后我将使用此数组来查找文章中包含的中文单词:
$ds = file($Dictionary);
$_SP_ = chr(0xFF).chr(0xFE);
$array = array();
foreach($ds as $d)
{
$spstr = _SP_;//
$spstr = iconv(ucs-2be, 'utf-8', $spstr);
$ws = explode(',', $d);//array of single Chinese word
$wall = iconv('utf-8', ucs-2be, join($spstr, $ws));//what is $wall used for?
$ws = explode(_SP_, $wall);
foreach($ws as $estr)
{
$array[$estr] = strlen($estr);
}
}
我的问题:
$_SP_ = chr(0xFF).chr(0xFE) mean?chr(0xFF).chr(0xFE)
是从ASCII中的最后两个字符中检索的字符串,这两者的组合用于什么?
为什么我应该从ucs-2b将 SP 转换为utf-8格式?
为什么$ws
再次转换为字符串,但以chr(0xFF).chr(0xFE)
的utf-8类型分隔。
为什么它需要每个单词的长度?
为什么$spstr
属于UCS-2be类型,只是因为它是chr(0xFF).chr(0xFE)
的组合?