PHP如何使用不正确的json格式解析文件

时间:2013-10-24 11:39:25

标签: php json parsing

我需要解析一个看起来像JSON文件的文件,但事实并非如此。它错过了: char,所以我无法使用json_decode解析它。我不是这个文件的拥有者,所以我必须把它当作它...我怎么解析这个文件?有什么想法吗?谢谢

"sound_materials"
{
    "common"
    {
        "value"     "0"
        "start_drag_sound"      "ui.inv_pickup"
        "end_drag_sound"        "ui.inv_drop"
        "equip_sound"       "ui.inv_equip"
    }
    "chest"
    {
        "value"     "1"
        "start_drag_sound"      "ui.inv_pickup_chest"
        "end_drag_sound"        "ui.inv_drop_chest"
    }
    "pennant"
    {
        "value"     "2"
        "start_drag_sound"      "ui.inv_pickup_pennant"
        "end_drag_sound"        "ui.inv_drop_pennant"
    }
    "key"
    {
        "value"     "3"
        "start_drag_sound"      "ui.inv_pickup_key"
        "end_drag_sound"        "ui.inv_drop_key"
    }
    "metal_small"
    {
        "value"     "4"
        "start_drag_sound"      "ui.inv_pickup_metalsmall"
        "end_drag_sound"        "ui.inv_drop_metalsmall"
        "equip_sound"       "ui.inv_equip_metalsmall"
    }
    "metal_armor"
    {
        "value"     "5"
        "start_drag_sound"      "ui.inv_pickup_metalarmour"
        "end_drag_sound"        "ui.inv_drop_metalarmour"
        "equip_sound"       "ui.inv_equip_metalarmour"
    }
    "metal_blade"
    {
        "value"     "6"
        "start_drag_sound"      "ui.inv_pickup_metalblade"
        "end_drag_sound"        "ui.inv_drop_metalblade"
        "equip_sound"       "ui.inv_equip_metalblade"
    }
    "metal_heavy"
    {
        "value"     "7"
        "start_drag_sound"      "ui.inv_pickup_metalheavy"
        "end_drag_sound"        "ui.inv_drop_metalheavy"
        "equip_sound"       "ui.inv_equip_metalheavy"
    }
    "staff_or_blunt"
    {
        "value"     "8"
        "start_drag_sound"      "ui.inv_pickup_staff"
        "end_drag_sound"        "ui.inv_drop_staff"
        "equip_sound"       "ui.inv_equip_staff"
    }
    "robes"
    {
        "value"     "9"
        "start_drag_sound"      "ui.inv_pickup_robes"
        "end_drag_sound"        "ui.inv_drop_robes"
        "equip_sound"       "ui.inv_equip_robes"
    }
    "leather"
    {
        "value"     "10"
        "start_drag_sound"      "ui.inv_pickup_leather"
        "end_drag_sound"        "ui.inv_drop_leather"
        "equip_sound"       "ui.inv_equip_leather"
    }
    "quiver"
    {
        "value"     "11"
        "start_drag_sound"      "ui.inv_pickup_quiver"
        "end_drag_sound"        "ui.inv_drop_quiver"
        "equip_sound"       "ui.inv_equip_quiver"
    }
    "stone"
    {
        "value"     "12"
        "start_drag_sound"      "ui.inv_pickup_stone"
        "end_drag_sound"        "ui.inv_drop_stone"
        "equip_sound"       "ui.inv_equip_stone"
    }
    "wood"
    {
        "value"     "13"
        "start_drag_sound"      "ui.inv_pickup_wood"
        "end_drag_sound"        "ui.inv_drop_wood"
        "equip_sound"       "ui.inv_equip_wood"
    }
    "bone"
    {
        "value"     "14"
        "start_drag_sound"      "ui.inv_pickup_bone"
        "end_drag_sound"        "ui.inv_drop_bone"
        "equip_sound"       "ui.inv_equip_bone"
    }
    "jug"
    {
        "value"     "15"
        "start_drag_sound"      "ui.inv_pickup_jug"
        "end_drag_sound"        "ui.inv_drop_jug"
        "equip_sound"       "ui.inv_equip_jug"
    }
    "gun"
    {
        "value"     "16"
        "start_drag_sound"      "ui.inv_pickup_gun"
        "end_drag_sound"        "ui.inv_drop_gun"
        "equip_sound"       "ui.inv_equip_gun"
    }
    "highvalue"
    {
        "value"     "17"
        "start_drag_sound"      "ui.inv_pickup_highvalue"
        "end_drag_sound"        "ui.inv_drop_highvalue"
        "equip_sound"       "ui.inv_equip_highvalue"
    }
}

修改

所以我使用了h2o的正则表达式,它可以很好地格式化文件。我的错误是在上面的例子中我只放了一个带有1行键的部分。

我有一些其他部分文件,你有子键,在这种情况下,我应该为子键添加[]分隔符..:

2 个答案:

答案 0 :(得分:4)

那绝对是无论那种格式是什么,因为它不是json 。如果您保证它总是看起来与您的OP(每行一个键)完全相同,那么您可以通过这样做来修复它:

$json = preg_replace('/^(\s*"[^"]+")/m', '$1:', $json);

DEMO

正则表达式尸检

  • ^ - 线必须从这里开始
  • (\s*"[^"]+") - 一个捕获组(这是$1所匹配的):
    • \s* - 空格/制表符/换行符重复0次或更多次
    • " - 文字"字符
    • [^"]+ - 任何非"重复一次或多次的字符
    • " - 文字"字符
  • /m我们的修饰符(多行)。这意味着^将每行工作,而不是仅匹配整个字符串的开头。

修改

警告:这不会在值之间添加逗号!

您可能最好使用:

$json = preg_replace('/("[^"]+")(\s*{[^}]+})/', '$1:$2,', $json); //Add comma for brackets
$json = preg_replace('/("[^"]+")(\s*"[^"]+")/', '$1:$2,', $json); //Add comma for values

这也适用于单行,但要求除了令牌之外,你永远不会在其他任何地方使用字符{}"(甚至在字符串里面。)

再次修改

这似乎可以解决问题,可以使用json_decode并解析JSONLint,但它非常难看且模糊不清:

$json = preg_replace('/(")(\s*{)/m', '$1:$2', $json); //Fix colons after keys with brackets
$json = preg_replace('/(")([ \t]*")/m', '$1:$2', $json); //Fix colons after keys with values
$json = preg_replace('/(}\s*$)(\s*")/m', '$1,$2', $json); //Fix commas on lines with brackets
$json = preg_replace('/("\s*$)(\s*")/m', '$1,$2', $json); //Fix commas on lines with values
$json = preg_replace('/"[0-9]+":\s*{/m', '{', $json); //Fix invalid keys
$json = trim($json);

if ($json[0] == '{' && substr($json, -1) == '}') {
    $json = '[' . $json . ']';
} else {
    $json = '{' . $json . '}';
}

print_r(json_decode($json));

<强>更新

<?php
    $curl = curl_init();
    curl_setopt_array($curl, array(
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_URL => "file.txt"
    ));
    $json = curl_exec($curl);

    $json = Horrible_JSON::Parse($json);
    print_r($json);

    class Horrible_JSON {
        public static function Parse($json) {
            $jsonLength = strlen($json);
            $realJSON = '';
            $isValue = false;
            for ($i = 0; $i < $jsonLength; $i++) {
                if ($json[$i] != "\n" && $json[$i] != "\r" && $json[$i] != "\t" && $json[$i] != " ") {
                    if ($json[$i] == '"') {
                        $nextQuote = strpos($json, '"', $i + 1);
                        $quoteContent = substr($json, $i + 1, $nextQuote - $i - 1);
                        if (!$isValue && preg_match('/^[0-9]+$/', $quoteContent)) {
                            $quoteContent = 'int_' . $quoteContent;
                        }
                        $realJSON .= '"' . $quoteContent . '"';
                        if (!$isValue) {
                            $realJSON .= ':';
                            $isValue = true;
                        } else {
                            $realJSON .= ',';
                            $isValue = false;
                        }
                        $i = $nextQuote;
                    } else {
                        if ($json[$i] == '{' || $json[$i] == '}') {
                            $isValue = false;
                        }
                        $realJSON .= $json[$i];
                        if ($json[$i] == '}') {
                            $realJSON .= ',';
                        }
                    }
                }
            }
            $realJSON = str_replace(',}', '}', $realJSON);
            $realJSON = substr($realJSON, 0, -1);

            if (substr($realJSON, 0, 1) == '{' && substr($realJSON, -1) == '}') {
                $realJSON = '[' . $realJSON . ']';
            } else {
                $realJSON = '{' . $realJSON . '}';
            }

            return json_decode($realJSON);
        }
    }
?>

答案 1 :(得分:0)

如果无法访问原始文件,那么对其进行逆向工程就很难完成。

如果这是一次性的,那么只需使用文本编辑器 - 很明显在哪里插入':'以使其看起来像JSON文件。

如果你需要处理很多这些 - 然后与产生数据的人联系并要求格式的正式定义或者他们将格式更改为JSON。

如果这些都不可能,那么编写代码以在两个引用的实体之间注入:是微不足道的。 但您无法保证这是对文件格式的有效解释