PHP - RecursiveDirectoryIterator + RecursiveIteratorIterator + RegexIterator的混合字符集文件名(拉丁语,日语,韩语)错误

时间:2013-11-07 14:11:00

标签: php json character-encoding multilingual jplayer

我正在阅读我的音乐目录以填充jPlayer的JSON,如下所示:

<?php
//tried utf-8, shift_jis, etc. No difference
header('Content-Type: application/json; charset=SHIFT_JIS');

//cant be blank so i put . to make current file dir as base
$Directory = new RecursiveDirectoryIterator('.');
$Iterator = new RecursiveIteratorIterator($Directory);
$Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH);
//instead of glob(*/*.mp3) because isnt recursive

$filesJson = [];

foreach ($Regex as $key => $value) {
    $whatever = str_ireplace(['.mp3','.\\'], '', $key);
    $filesJson['mp3'][] = [
        'title' => htmlspecialchars($whatever),
        'mp3' => $key
    ];

}
echo json_encode($filesJson);
exit();
?>

问题在于文件名不是标准的UTF-8 - 如拉丁语,日语和韩语。例子:

日本

enter image description here

韩国

enter image description here

拉丁语(pt-br)

enter image description here

在解析拉丁名称(?null时,会转换为Geração或简单地变为

enter image description here


那么,如何使用不同类型的语言正确解析文件名/路径? 标题字符集没有帮助。

的信息:

XAMPP与Apache2 + PHP 5.4.2在Win7 x86


更新#1:

尝试了@ infinity的答案,但没有变化。 JP上仍为?,拉丁语为null

<?php
header('Content-Type: application/json; charset=UTF-8');
mb_internal_encoding('UTF-8');

$Directory = new RecursiveDirectoryIterator('.');
$Iterator = new RecursiveIteratorIterator($Directory);
$Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH);

$filesJson = [];

foreach ($Regex as $key => $value) {
    $whatever = mb_substr($key, 2, mb_strlen($key)-6, "utf-8"); // 2 to remove .\ and -6 to remove .mp3 (-4 + -2)
    $filesJson['mp3'][] = [
        'title' => $whatever, //tried with and without htmlspecialchars
        'mp3' => $key
    ];

}
echo json_encode($filesJson);
exit();
?>

如果我在HTML-ENTITIES上使用utf-8而不是mb_substr(),则拉丁字符可以正常工作但亚洲仍然?

4 个答案:

答案 0 :(得分:1)

<?php
header('Content-Type: application/json; charset=utf-8');
mb_internal_encoding('utf-8');

foreach ($Regex as $key => $value) {
    $whatever = mb_substr($key, 0, mb_strlen($str)-4, "utf-8");
    // ... rest of code
}

答案 1 :(得分:1)

使用dir()进行递归方法的简短尝试:

myRecursiveScanDir($mypath);

function myRecursiveScanDir($path)
    $d = dir($path);
    while (false !== ($entry = $d->read())) {

       // Do something, ie just echo it
       echo $path."/".entry."<br/>";

       if(is_dir($path."/".entry))
           myRecursiveScanDir($path."/".entry);
    }
    $d->close();
)

获取文件扩展名和/或basename也可能有点问题。您可能必须调试并测试mb_substr,pathinfo和basename对此类文件名的反应。

答案 2 :(得分:1)

在这种情况下,您使用的操作系统可能很重要:

请回答这个问题:Why does Windows need to `utf8_decode` filenames for `file_get_contents` to work?

我认为这可能是相关的,因为截图看起来非常 Microsoft。

答案 3 :(得分:0)

匹配任何字母/数字

  

\p{L}\p{N}