如何在PHP中修复格式错误的JSON?

时间:2012-11-05 17:03:28

标签: php mysql json

我正在获取JSON格式的数据Feed,这是唯一可用的格式。在PHP中,我使用json_decode来解码JSON,但它已经破解了,我发现JSON是在一些人的昵称中用双引号生成的。我使用以下方法验证了 http://jsonformatter.curiousconcept.com

我无法控制数据的创建,但是当它发生时我必须处理这种破碎的格式。解析后的数据将被放入MySQL TABLE。

例如:

"contact1": "David "Dave" Letterman",

json_decode将返回NULL。如果我手动保存文件,并将其更改为Dave昵称周围的单引号,那么一切正常。

$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);

如何在json_decode处理json_string之前修复损坏的JSON格式? 应该怎么做才能预处理文件,反斜杠昵称的双引号?或者将它们改为单引号?在MySQL中存储这样的双引号是一个好主意吗?

我不知道每个数据Feed何时会出现这种情况,所以我不想只检查contact1是否有内部双引号来修复它们。在PHP中是否有一种方法可以采用上述示例中的一行,并且除了外部双引号之外反斜杠后面的所有内容?谢谢!

这是tftd提供的正确代码:

<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David \"Dave\" Letterman",

$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "\n";
echo $preg . "\n";
?>

这是输出:

string(39) ""contact1": "David \"Dave\" Letterman","
NULL

"contact1": "David \"Dave\" Letterman",

5 个答案:

答案 0 :(得分:3)

我有一个自己的jsonFixer()函数 - 它分两步工作:删除垃圾(用于非相干格式的相等)和重新格式化。

<?php
  function jsonFixer($json){
    $patterns     = [];
    /** garbage removal */
    $patterns[0]  = "/([\s:,\{}\[\]])\s*'([^:,\{}\[\]]*)'\s*([\s:,\{}\[\]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
    $patterns[1]  = '/([^\s:,\{}\[\]]*)\{([^\s:,\{}\[\]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[2]  =  "/([^\s:,\{}\[\]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[3]  = "/(}),\s*/"; //JSON.parse() doesn't allow trailing commas
    /** reformatting */
    $patterns[4]  = '/([^\s:,\{}\[\]]+\s*)*[^\s:,\{}\[\]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[5]  = '/["\']+([^"\':,\{}\[\]]*)["\']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
    $patterns[6]  = '/(")([^\s:,\{}\[\]]+)(")(\s+([^\s:,\{}\[\]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[7]  = "/(')([^\s:,\{}\[\]]+)(')(\s+([^\s:,\{}\[\]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[8]  = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
    $patterns[9]  = '/,\s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
    $patterns[10] = '/\s+/'; //Find one or more spaces...
    $patterns[11] = '/^\s+/'; //Find one or more spaces at start of string...

    $replacements     = [];
    /** garbage removal */
    $replacements[0]  = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
    $replacements[1]  = '$1 { $2'; //...and put spaces between them;
    $replacements[2]  = '$1 }'; //...and put a space between them;
    $replacements[3]  = '$1'; //...so, remove trailing commas of any right curly brackets;
    /** reformatting */
    $replacements[4]  = '"$0"'; //...and put quotation marks surrounding them;
    $replacements[5]  = '"$1"'; //...and replace by single quotation marks;
    $replacements[6]  = '\\$1$2\\$3$4'; //...and add back slashes to its quotation marks;
    $replacements[7]  = '\\$1$2\\$3$4'; //...and add back slashes to its apostrophes;
    $replacements[8]  = '$1, $2'; //...and put a comma followed by a space character between them;
    $replacements[9]  = ' $1'; //...and replace by a space followed by a right curly bracket;
    $replacements[10] = ' '; //...and replace by one space;
    $replacements[11] = ''; //...and remove it.

    $result = preg_replace($patterns, $replacements, $json);

    return $result;
  }
?>

使用示例:

<?php
  // Received badly formatted json:
  // {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
  $json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, \'details\' : "Greatest \'Hits\' Album"}';
  jsonFixer($json_string);
?>

结果:

{"contact1": "David \"Dave\" Letterman", "price" : "30.00", "details" : "Greatest \'Hits\' Album"}

注意:这并未使用所有可能格式错误的JSON字符串进行测试,但我使用的是复杂的多级JSON字符串,并且在此之前运行良好。

答案 1 :(得分:1)

正如其他人已经指出的那样,最好是告诉客户端JSON格式的问题。请他们将bug报告发送给原始开发人员/公司,以便他们可以修复它。如果他/他们无法修复它 - 那么请提供您的解决方案。您只需在addslashes之前json_encode字符串{。}}。

如果由于某种原因你最终不得不fix格式化,这可能是一种可能适合你的方式:

$data = '"contact1": "David "Dave" Letterman", "contact2": "Peter "Robert" Smith",{\'test\': \'working "something"\'}';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
// string '"contact1": "David \"Dave\" Letterman", "contact2": "Peter \"Robert\" Smith",{'test': 'working \"something\"'}' (length=110)

请记住,如果有人再次使用json格式,这可能会破坏。

答案 2 :(得分:0)

告诉他们在输出之前逃避他们的字符串。您甚至可以提供修复或提供代码解决方案。

否则,您可以将preg_replace与正则表达式

一起使用

请参阅Replacing specified double quotes in text with preg_replace

答案 3 :(得分:0)

正如其他人所说,你可以进行搜索和替换,但困难的部分是创建模糊匹配规则,因为为了解析它,你需要假设一些东西。也许你需要假设:

1a)键不包含冒号
1b)或关键引号被正确转义

2a)值不包含逗号
2b)或值已正确转义引号。

即使这样,你可能会遇到你的解析混淆的情况,如果他们有JSON的评论就会变得更糟。 (不符合,但很常见。)

现在,根据数据,您可以使用换行符来决定何时查看新密钥,但同样,这不可靠并且您开始做出很大的假设。

所以,长话短说,要么你必须做出一些可能在任何时候出错的假设,要么你需要让它们来修复数据。

答案 4 :(得分:0)

当逗号和[]中的值包含json字符串,担心和噩梦开始时,正则表达式不可靠。在php json_decode fails without quotes on key中,建议使用pear Services_JSON,如果为类名固定的代码已结束且无效json的游戏结束,则将获得最满意的结果:

<dependency>
    <groupId>com.sun.xml.messaging.saaj</groupId>
    <artifactId>saaj-impl</artifactId>
    <version>1.5.1</version>
</dependency>