我正在尝试通过json_normalize将JSON数据导入Dataframe,但无法使其正常工作。
我的数据:
a 键与 c1 键
[
{
"a": "A1",
"b": "B1",
"c": [
{
"c1": "C111",
"c2": "C121",
"c3": ["C1131","C1132"]
}
]
},
{
"a": "A2",
"b": "B2",
"c": [
{
"c1": "C211",
"c2": "C212",
"c3": ["C2131","C2132"]
},
{
"c1": "C221",
"c2": "C222",
"c3": ["C2231"]
}
]
}
]
我想制作一个像这样的DataFrame
a c1(a) c2 c3
0 A1 C111 C121 ["C1131","C1132"]
1 A2 C211 C212 ["C2131","C2132"]
2 A2 C221 C222 ["C2231"]
当我使用json_normalize时,它显示ValueError
:
entity_df = json_normalize(data, 'c', 'a')
ValueError: Conflicting metadata name a, need distinguishing prefix
如何更改json_normalize参数? 任何帮助将不胜感激。
答案 0 :(得分:1)
您可以尝试:
from collections import defaultdict
norm_data = defaultdict(list)
for item in data:
for element in item['c']:
norm_data['a'].append(item['a'])
for k, v in element.items():
if k in {'a', 'c1'}:
norm_data['c1(a)'].append(v)
else:
norm_data[k].append(v)
pd.DataFrame(norm_data)
答案 1 :(得分:0)
data = [
{
"a": "A1",
"b": "B1",
"c": [
{
"c1": "C111",
"c2": "C121",
"c3": ["C1131","C1132"]
}
]
},
{
"a": "A2",
"b": "B2",
"c": [
{
"c1": "C211",
"c2": "C212",
"c3": ["C2131","C2132"]
},
{
"c1": "C221",
"c2": "C222",
"c3": ["C2231"]
}
]
}
]
pd.io.json.json_normalize(data,"c", ['a', 'b',])
输出:
c1 c2 c3 a b
0 C111 C121 [C1131, C1132] A1 B1
1 C211 C212 [C2131, C2132] A2 B2
2 C221 C222 [C2231] A2 B2
答案 2 :(得分:0)
如果您已经经历过混淆真实数据的痛苦,请使模拟数据也具有与真实数据相同的功能。
假设您具有以下JSON:
json_data = [
{
"a": "A1",
"b": "B1",
"c": [
{
"a": "C111",
"c2": "C121",
"c3": ["C1131","C1132"]
}
]
},
{
"a": "A2",
"b": "B2",
"c": [
{
"a": "C211",
"c2": "C212",
"c3": ["C2131","C2132"]
},
{
"a": "C221",
"c2": "C222",
"c3": ["C2231"]
}
]
}
]
您只需要一行代码即可提取它:
pd.io.json.json_normalize(json_data, 'c', ['a', 'b'], record_prefix='data.')
结果:
data.a data.c2 data.c3 a b
0 C111 C121 [C1131, C1132] A1 B1
1 C211 C212 [C2131, C2132] A2 B2
2 C221 C222 [C2231] A2 B2
record_prefix='data.'
错误消息是什么意思。