Question

以下代码

import pandas as pd

dic = {'_id': '5436e3abbae478396759f0cf', 'meta': {'clinical': {'benign_malignant': 'benign', 'age_approx': 55, 'sex': 'female', 'diagnosis': 'nevus', 'diagnosis_confirm_type': None, 'anatom_site_general': 'anterior torso', 'melanocytic': True}, 'acquisition': {'image_type': 'dermoscopic', 'pixelsX': 1022, 'pixelsY': 767}}, 'name': 'ISIC_0000000'}

frame = pd.io.json.json_normalize(dic)

抛出一个

KeyError: 'diagnosis_confirm_type'

我正在使用pandas版本0.23.0。代码在版本0.22.0中正常运行。

更新

显然，0.23.0中确实存在导致此问题的错误。见https://github.com/pandas-dev/pandas/pull/21164

Answer 1

如果您最初将其作为字符串获取，则甚至不需要正则表达式：

validPJson = [line.replace('None', '"None"').replace('True', '"True"') for line in invalidJsonObjects]

请在此处查看为什么它比正则表达式更好：Use Python's string.replace vs re.sub

编辑：从评论中我了解到您的问题是加载该格式的文件而不先修复它，这就是您在加载时遇到错误的原因（顺便说一句，这些错误应该是真的在你的问题中，否则你只是困惑了很多人试图提供帮助。）

我的建议是，先用类似的方法修复文件：

with open(pathToFile, 'r') as fp:
    contents = fp.read()
with open(pathToFile, 'w') as fp:
    fp.write(contents.replace('None', '"None"').replace('True', '"True"'))

只有在尝试使用json来阅读文件之后，看看是否有效

使用pandas normalize方法无法解释的密钥错误

更新

1 个答案: