Json_normalize逐字符给出结果

时间:2019-02-28 10:43:42

标签: python json pandas

我目前正在尝试使用pandas规范化json文件,并在处理该文件时遇到问题。

json文件如下所示:

{ "valid": false, 
  "checks": {"bank_check": {"valid": true, "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}, 
              "company_check": {"valid": true, "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}, 
              "ceo_check": {"valid": true, "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}}

我有兴趣在这样的表格中获取支票清单:

| bank_check  | company_check | ceo_check|
------------------------------------------
| true        | true          | true     |

但是当我使用json_normalize时,我得到了:

Result of json_normalize character by character

如果我使用works_data=json_normalize(d[1], record_path=['result', 'checks']),则会收到错误string indices must be integers

有人以前遇到过这个吗?还是您不知道为什么我会得到这个奇怪的结果?

预先感谢您的回复。

1 个答案:

答案 0 :(得分:1)

不确定为什么会出现此问题(请注意示例中的json缺少结尾})。我尝试自己进行归一化,并能够产生所需的输出:

from pandas.io.json import json_normalize

d = { "valid": 'false', 
   "checks": {"bank_check": {"valid": 'true', "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}, 
              "company_check": {"valid": 'true', "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}, 
              "ceo_check": {"valid": 'true', "reasons": {}, "last_checked_at": "2019-02-19", "first_checked_at": "2019-02-01"}}}



df = json_normalize(d['checks'])
cols = [ col for col in list(df.columns) if 'valid' in col ]   

works_data = df[cols] 

输出:

print (works_data)
  bank_check.valid ceo_check.valid company_check.valid
0             true            true                true