使用json_decode编码问题

时间:2014-11-11 21:26:00

标签: php json

我在尝试检索第三方Feed时遇到了某种编码问题,当使用json_last_error()报告回Unexpected control character found时。

根据我的阅读,这可能是由混合中出现的非UTF-8字符引起的。

我通过linter运行复制的JSON,并且有效。将JSON从远程提要复制/粘贴到字符串中并以这种方式解码工作正常,而不是通过file_get_contents直接访问时。

{
    "numberOfResults": 124,
    "queryTime": 0,
    "products": [
        {
            "productId": "9130047$0290f955-ce36-46c9-9771-184f05985c62",
            "status": null,
            "serviceId": null,
            "productName": null,
            "serviceName": null,
            "productDescription": null,
            "serviceDescription": null,
            "productCategoryId": null,
            "nearestLocation": null,
            "boundary": null,
            "distanceToLocation": null,
            "startDate": null,
            "endDate": null,
            "productImage": null,
            "serviceImage": null,
            "tqual": null,
            "trip_advisor": null,
            "freeEntry": null,
            "booster": null,
            "starRating": null,
            "rateFrom": null,
            "rateTo": null,
            "productClassifications": null,
            "internet_service_ssid": null,
            "internet_service_type": null,
            "linked_productid": null,
            "states": null,
            "suburbs": null,
            "addresses": null,
            "cities": null,
            "comms_em": null,
            "comms_mb": null,
            "comms_burl": null,
            "comms_url": null,
            "comms_ph": null,
            "comms_fx": null,
            "comms_wap": null,
            "internet_points": null
        }
    ],
    "facetGroups": []
}

只是一个简单的解码...

$raw = file_get_contents($url);
$result = json_decode($raw, false);

// json_last_error() shows JSON_ERROR_CTRL_CHAR

1 个答案:

答案 0 :(得分:0)

感谢@UlrichEckhardt的建议,这个链接提供了一些不错的正则表达式,以防其他人遇到这个问题。

// Modified from http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/
// Simply strip out incompatible chars
function lint_json($string) {
    //reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
    $string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]|[\x00-\x7F][\x80-\xBF]+|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S', '', $string );

    //reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
    $string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]|\xED[\xA0-\xBF][\x80-\xBF]/S','', $string );

    return $string;
}

修改

经过进一步调查后,归结为提供的JSON为UTF-16,这在使用json_decode时会导致明显的问题。以下代码修复了该问题。

function lint_json2($string) {
    $string = iconv('UTF-16LE//IGNORE', 'UTF-8', $string);

    // Dirty, but strip anything before first JSON opening tag
    $string = strstr($string, '{');

    return $string;
}