用熊猫解析深度嵌套的JSON json_normalize

时间:2019-05-31 20:50:19

标签: pandas python-2.7

我似乎无法使用json_normalize从嵌套的json中提取所需的所有元数据。请参阅下面的JSON。我正在尝试从内容节点检索标题(“某些书”),但只能成功地深入到内容。

例如:

circleci/node

产生

json_normalize(result_data, 'data', ['title', 'key',['group_dimensions','content']])

但是我仍然需要提取“标题”。但是要更深一层:

,date,units,group_dimensions.content,key,title
0,2019-03-17T00:00:00.000Z,0.0,"{u'key': u'1358883623', u'title': u'Some Book'}",143489,Czech Republic
1,2019-03-24T00:00:00.000Z,10.0,"{u'key': u'1358883623', u'title': u'Some Book'}",143489,Czech Republic
2,2019-03-31T00:00:00.000Z,13.0,"{u'key': u'1358883623', u'title': u'Some Book'}",143489,Czech Republic
3,2019-03-17T00:00:00.000Z,0.0,"{u'key': u'1358883623', u'title': u'Some Book'}",143487,Romania

产生错误: TypeError:顺序项目1:期望的字符串,找到列表

想法?

json_normalize(result_data, 'data', ['title', 'key',['group_dimensions',['content','title']]])

1 个答案:

答案 0 :(得分:0)

您可以使用我制作的以下软件包。它将扩展它在DataFrame中找到的每个dict填充。

import flat_table
# I am selecting group dimentions, key, metadata, and title.
df = pd.DataFrame(result_data).iloc[:,1:]
flat_table.normalize(df)

它将找到所有词典并展开为新列。

   index store_front_ica.title store_front_ica.key content.title content.key  key_x         title_x         title_y   key_y
0      0        Czech Republic              143489     Some Book   123456789 143489  Czech Republic  Czech Republic  143489
1      1               Romania              143487     Some Book   123456789 143487         Romania         Romania  143487

该程序包还将列表扩展成行,这里是完整的df

   index  units                      date store_front_ica.title  store_front_ica.key content.title content.key   key_x         title_x        title_y   key_y
0      0    0.0  2019-03-17T00:00:00.000Z        Czech Republic               143489     Some Book   123456789  143489  Czech Republic Czech Republic  143489 
1      0   10.0  2019-03-24T00:00:00.000Z        Czech Republic               143489     Some Book   123456789  143489  Czech Republic Czech Republic  143489 
2      0   13.0  2019-03-31T00:00:00.000Z        Czech Republic               143489     Some Book   123456789  143489  Czech Republic Czech Republic  143489 
3      1    0.0  2019-03-17T00:00:00.000Z               Romania               143487     Some Book   123456789  143487         Romania        Romania  143487 
4      1    0.0  2019-03-24T00:00:00.000Z               Romania               143487     Some Book   123456789  143487         Romania        Romania  143487 
5      1  200.0  2019-03-31T00:00:00.000Z               Romania               143487     Some Book   123456789  143487         Romania        Romania  143487 

您可以尝试flat-table