我下面有4级嵌套的JSON文件,我想将其标准化为一级嵌套:
输入文件如下:
{
"@index": "40",
"row": [
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "663.12",
"@width": "250.01",
"@height": "12.00",
"#text": "text 1"
}
}
]
},
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "8.0",
"@x": "121.10",
"@y": "675.36",
"@width": "348.98",
"@height": "8.04",
"#text": "text 2"
}
},
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "473.30",
"@y": "676.92",
"@width": "42.47",
"@height": "12.00",
"#text": "text 3"
}
}
]
},
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "690.72",
"@width": "433.61",
"@height": "12.00",
"#text": "text 4"
}
}
]
}
]
}
所需的输出是这样的:
{
"@index": "40",
"row": [
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "663.12",
"@width": "250.01",
"@height": "12.00",
"#text": "Text 1"
},
{
"@fontName": "Times New Roman",
"@fontSize": "8.0",
"@x": "121.10",
"@y": "675.36",
"@width": "348.98",
"@height": "8.04",
"#text": "Text 2"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "473.30",
"@y": "676.92",
"@width": "42.47",
"@height": "12.00",
"#text": "Text 3"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "690.72",
"@width": "433.61",
"@height": "12.00",
"#text": "Text 4"
}
]
}
到目前为止,我使用pandas的代码在下面,但是我不知道如何继续规范化到一个级别。
import json
import pandas as pd
from pandas.io.json import json_normalize #package for flattening json in pandas df
#load json object
with open('D:\Files\JSON\4Level.json') as f:
d = json.load(f)
nycphil = json_normalize(d['row'])
print (nycphil.head(4))
这是列表的当前输出,其中显示column
是一个嵌套元素:
column
0 [{'text': {'@fontName': 'Times New Roman', '@f...
1 [{'text': {'@fontName': 'Times New Roman', '@f...
2 [{'text': {'@fontName': 'Times New Roman', '@f...
具有一级嵌套的打印将是:
text.#text text.@fontName text.@fontSize ... text.@width text.@x text.@y
0 Text 1 Times New Roman 12.0 ... 250.01 85.10 663.12
1 Text 2 Times New Roman 8.0 ... 348.98 121.10 675.36
2 Text 3 Times New Roman 12.0 ... 42.47 473.30 676.92
3 Text 4 Times New Roman 12.0 ... 433.61 85.10 690.72
输入/输出比较如下:
也许有人可以帮助我。感谢您的帮助。
更新
为了在我展示的第一个示例输入中制作一个小的示例,我删除了一些脚本中似乎需要的元素才能正常工作。因此,现在我显示与真实文件完全相同的结构,但是使用此输入您的脚本将不起作用。我认为它们需要一些调整,但是我一直在尝试,但我不知道如何更改它们以通过此新输入获得相同的输出。也许您可以帮我,对不起您从一开始就没有显示正确的输入。
{
"document":{
"page":[
{
"@index":"0",
"image":{
"@data":"ABC",
"@format":"png",
"@height":"620.00",
"@type":"base64encoded",
"@width":"450.00",
"@x":"85.00",
"@y":"85.00"
}
},
{
"@index":"1",
"row":[
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text1",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"71.04",
"@x":"121.10",
"@y":"83.42"
}
}
]
},
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text2",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"101.07",
"@x":"121.10",
"@y":"124.82"
}
}
]
}
]
},
{
"@index":"2",
"row":[
{
"column":{
"text":{
"#text":"Text3",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"363.44",
"@x":"85.10",
"@y":"69.62"
}
}
},
{
"column":{
"text":{
"#text":"Text4",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"382.36",
"@x":"85.10",
"@y":"83.42"
}
}
},
{
"column":{
"text":{
"#text":"Text5",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"435.05",
"@x":"85.10",
"@y":"97.22"
}
}
}
]
},
{
"@index":"3"
}
]
}
}
答案 0 :(得分:2)
作为json_normalize()
的替代方法,您还可以使用理解。:
my_dict["row"] = [{k: v for k, v in col_entry["text"].items()} for entry in my_dict["row"] for col_entry in entry["column"]]
编辑:固定的代码可以覆盖每个列列表中的多个条目。毫无疑问,这确实在理解的嵌套方面接近了痛苦阈值...
答案 1 :(得分:1)
以下是有效代码:
(56336255.json是您发布的示例数据)
import json
import pprint
flat_data = dict()
with open('56336255.json') as f:
data = json.load(f)
for k, v in data.items():
if k == '@index':
flat_data[k] = data[k]
else:
flat_data[k] = []
for row in v:
for cell in row['column']:
flat_data[k].append(cell['text'])
pprint.pprint(flat_data)
输出
{'@index': '40',
'row': [{'#text': 'text 1',
'@fontName': 'Times New Roman',
'@fontSize': '12.0',
'@height': '12.00',
'@width': '250.01',
'@x': '85.10',
'@y': '663.12'},
{'#text': 'text 2',
'@fontName': 'Times New Roman',
'@fontSize': '8.0',
'@height': '8.04',
'@width': '348.98',
'@x': '121.10',
'@y': '675.36'},
{'#text': 'text 3',
'@fontName': 'Times New Roman',
'@fontSize': '12.0',
'@height': '12.00',
'@width': '42.47',
'@x': '473.30',
'@y': '676.92'},
{'#text': 'text 4',
'@fontName': 'Times New Roman',
'@fontSize': '12.0',
'@height': '12.00',
'@width': '433.61',
'@x': '85.10',
'@y': '690.72'}]}
答案 2 :(得分:1)
这可以完成工作:
Highcharts.chart('container', {
chart: {
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: ''
},
exporting:{
enabled:false
},
tooltip: {
enabled:false
},
colors:['red', 'blue', 'green'],
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: false,
format: '<b>{point.name}</b>: {point.percentage:.1f} %',
style: {
color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black'
}
}
},
series: {
point: {
events: {
mouseOver: function () {
var chart = this.series.chart;
if (!chart.lbl) {
chart.lbl = chart.renderer.label('')
.attr({
padding: 10,
})
.css({
color: 'red',
})
.add();
}
chart.lbl
.show()
.attr({
text: this.y + '%'
});
}
}
},
},
},
tooltip: {
borderWidth: 0,
backgroundColor: 'none',
headerFormat: '',
shadow: false,
style: {
fontSize: '16px'
},
pointFormat: '<span style="font-size:40px;color:black; font-weight: bold">{point.y}%</span><br><span>50 Kg</span>{point.custom.customParam}',
positioner: function (labelWidth) {
return {
x: (this.chart.chartWidth - labelWidth) / 2,
y: this.chart.plotHeight/2
};
}
},
series: [{
name: 'Brands',
colorByPoint: true,
data: [{
name: 'Microsoft Internet Explorer',
y: 56.33
}, {
name: 'Chrome',
y: 24.03,
}, {
name: 'Firefox',
y: 10.38
}, {
name: 'Safari',
y: 4.77
}, {
name: 'Opera',
y: 0.91
}, {
name: 'Proprietary or Undetectable',
y: 0.2
}],
innerSize:'80%'
}],
});
完整的工作示例:
data = json.load(json_file)
flat = [ column['text'] for entry in data['row'] for column in entry['column'] ]
答案 3 :(得分:1)
您可以使用列表理解:
import json
print(json.dumps(new_d, indent=4))
{
"@index": "40",
"row": [
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "663.12",
"@width": "250.01",
"@height": "12.00",
"#text": "text 1"
},
{
"@fontName": "Times New Roman",
"@fontSize": "8.0",
"@x": "121.10",
"@y": "675.36",
"@width": "348.98",
"@height": "8.04",
"#text": "text 2"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "473.30",
"@y": "676.92",
"@width": "42.47",
"@height": "12.00",
"#text": "text 3"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "690.72",
"@width": "433.61",
"@height": "12.00",
"#text": "text 4"
}
]
}
输出:
def flatten(d, t = ["image", "text"]):
for a, b in d.items():
if a in t:
yield b
elif isinstance(b, dict):
yield from flatten(b)
elif isinstance(b, list):
for i in b:
yield from flatten(i)
d = {'document': {'page': [{'@index': '0', 'image': {'@data': 'ABC', '@format': 'png', '@height': '620.00', '@type': 'base64encoded', '@width': '450.00', '@x': '85.00', '@y': '85.00'}}, {'@index': '1', 'row': [{'column': [{'text': ''}, {'text': {'#text': 'Text1', '@fontName': 'Arial', '@fontSize': '12.0', '@height': '12.00', '@width': '71.04', '@x': '121.10', '@y': '83.42'}}]}, {'column': [{'text': ''}, {'text': {'#text': 'Text2', '@fontName': 'Arial', '@fontSize': '12.0', '@height': '12.00', '@width': '101.07', '@x': '121.10', '@y': '124.82'}}]}]}, {'@index': '2', 'row': [{'column': {'text': {'#text': 'Text3', '@fontName': 'Arial', '@fontSize': '12.0', '@height': '12.00', '@width': '363.44', '@x': '85.10', '@y': '69.62'}}}, {'column': {'text': {'#text': 'Text4', '@fontName': 'Arial', '@fontSize': '12.0', '@height': '12.00', '@width': '382.36', '@x': '85.10', '@y': '83.42'}}}, {'column': {'text': {'#text': 'Text5', '@fontName': 'Arial', '@fontSize': '12.0', '@height': '12.00', '@width': '435.05', '@x': '85.10', '@y': '97.22'}}}]}, {'@index': '3'}]}}
print(json.dumps(list(filter(None, flatten(d))), indent=4))
编辑:要展平嵌套结构,可以对生成器使用递归:
[
{
"@data": "ABC",
"@format": "png",
"@height": "620.00",
"@type": "base64encoded",
"@width": "450.00",
"@x": "85.00",
"@y": "85.00"
},
{
"#text": "Text1",
"@fontName": "Arial",
"@fontSize": "12.0",
"@height": "12.00",
"@width": "71.04",
"@x": "121.10",
"@y": "83.42"
},
{
"#text": "Text2",
"@fontName": "Arial",
"@fontSize": "12.0",
"@height": "12.00",
"@width": "101.07",
"@x": "121.10",
"@y": "124.82"
},
{
"#text": "Text3",
"@fontName": "Arial",
"@fontSize": "12.0",
"@height": "12.00",
"@width": "363.44",
"@x": "85.10",
"@y": "69.62"
},
{
"#text": "Text4",
"@fontName": "Arial",
"@fontSize": "12.0",
"@height": "12.00",
"@width": "382.36",
"@x": "85.10",
"@y": "83.42"
},
{
"#text": "Text5",
"@fontName": "Arial",
"@fontSize": "12.0",
"@height": "12.00",
"@width": "435.05",
"@x": "85.10",
"@y": "97.22"
}
]
输出:
{{1}}
答案 4 :(得分:0)
尝试一下
#!/usr/bin/python
# -*- coding: utf-8 -*-
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
expected_output = flatten_json(input_data) # This will convert