我在尝试检索第三方Feed时遇到了某种编码问题,当使用json_last_error()
报告回Unexpected control character found
时。
根据我的阅读,这可能是由混合中出现的非UTF-8字符引起的。
我通过linter运行复制的JSON,并且有效。将JSON从远程提要复制/粘贴到字符串中并以这种方式解码工作正常,而不是通过file_get_contents
直接访问时。
{
"numberOfResults": 124,
"queryTime": 0,
"products": [
{
"productId": "9130047$0290f955-ce36-46c9-9771-184f05985c62",
"status": null,
"serviceId": null,
"productName": null,
"serviceName": null,
"productDescription": null,
"serviceDescription": null,
"productCategoryId": null,
"nearestLocation": null,
"boundary": null,
"distanceToLocation": null,
"startDate": null,
"endDate": null,
"productImage": null,
"serviceImage": null,
"tqual": null,
"trip_advisor": null,
"freeEntry": null,
"booster": null,
"starRating": null,
"rateFrom": null,
"rateTo": null,
"productClassifications": null,
"internet_service_ssid": null,
"internet_service_type": null,
"linked_productid": null,
"states": null,
"suburbs": null,
"addresses": null,
"cities": null,
"comms_em": null,
"comms_mb": null,
"comms_burl": null,
"comms_url": null,
"comms_ph": null,
"comms_fx": null,
"comms_wap": null,
"internet_points": null
}
],
"facetGroups": []
}
只是一个简单的解码...
$raw = file_get_contents($url);
$result = json_decode($raw, false);
// json_last_error() shows JSON_ERROR_CTRL_CHAR
答案 0 :(得分:0)
感谢@UlrichEckhardt的建议,这个链接提供了一些不错的正则表达式,以防其他人遇到这个问题。
// Modified from http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/
// Simply strip out incompatible chars
function lint_json($string) {
//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]|[\x00-\x7F][\x80-\xBF]+|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S', '', $string );
//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]|\xED[\xA0-\xBF][\x80-\xBF]/S','', $string );
return $string;
}
修改强>
经过进一步调查后,归结为提供的JSON为UTF-16,这在使用json_decode
时会导致明显的问题。以下代码修复了该问题。
function lint_json2($string) {
$string = iconv('UTF-16LE//IGNORE', 'UTF-8', $string);
// Dirty, but strip anything before first JSON opening tag
$string = strstr($string, '{');
return $string;
}