我正在尝试简化以下内容,但它仅适用于非三重嵌套的JSON。
工作代码:
导入json
import pandas as pd
from pandas.io.json import json_normalize
data = [{'masterName': 'AAAAAAAAAAA',
'shortname': 'AA',
'info': {
'name': 'randomka'
},
'mainNames': [{'date': '2019-05-16', 'NumberOne': 1111},
{'date': '2019-06-22', 'NumberOne': 2222}]}
]
result = json_normalize(data, 'mainNames', ['masterName', 'shortname',
['info', 'name']],errors='ignore')
不起作用:
data2 = [{"masterName": "AAAAAAAAAAA",
"mainNames": [
{
"numbers": [{
"date": "2019-05-16",
"NumberOne": 222}],
"name": "randomka"
},
{
"numbers": [{
"date": "2019-05-16",
"NumberOne": 222}],
"name": "randomka"
}
]
}]
json_normalize(data2, 'mainNames', ['masterName'],errors='ignore')
它返回时:
我已经在record_paths
代码中尝试了metas
和json_normalize
的替代方法,但是我无法使其适用于这种三层JSON。换句话说,我不能一口气拿走所有的专栏。
我尝试过的替代方法奏效了,看上去很近:
json_normalize(data2, ['mainNames','numbers'], ['masterName'],errors='ignore')
输出几乎是一个Excel视图,列中有数据。根据评论请求的预期视图:
UPD:数据可能具有多个分支:
data2 = [{"masterName": "AAAAAAAAAAA",
"mainNames": [
{
"numbers": [{
"date": "2019-05-16",
"NumberOne": 222}],
"name": "randomka"
},
{
"numbers": [{
"date": "2019-05-16",
"NumberOne": 222},
{
"date": "2019-07-01",
"NumberOne": 341}],
"name": "randomka"
}
]
}]
答案 0 :(得分:0)
正如@Aayush Mahajan在评论中所建议的那样,定义自己的函数可能更简单。这是使用Initialize sum to 0.
Initialize index to 0.
For every digit d from the least to most significant:
If the index is even, sum += d
Otherwise, sum += 10 * d
++index
sum %= 11
Return sum % 11
的人:
data2
更新:
您可以添加一个内部out = []
data2 = data2[0] # Remove first level
for main in data2["mainNames"]: # Iterate "mainNames"
sub_dict = {"masterName": data2['mainNames']} # Init new dict (df row) with "mainNames"
sub_dict.update(main["numbers"][0]) # Add all fields from "numbers"
sub_dict["name"] = main["name"] # Add "name" field
out.append(sub_dict) # append sud dict to list outputs
print(out)
# [{'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'},
# {'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'}]
# create Dataframe with from_dict
df = pd.DataFrame().from_dict(out)
print(df)
# masterName date NumberOne name
# 0 AAAAAAAAAAA 2019-05-16 222 randomka
# 1 AAAAAAAAAAA 2019-05-16 222 randomka
来遍历loop
字段:
numbers
同样,它仍然可以使用初始的out = []
data2 = data2[0] # Remove first level
for main in data2["mainNames"]: # Iterate "mainNames"
for numbers in main["numbers"]:
sub_dict = {"masterName": data2['masterName']} # Init new dict (df row) with "mainNames"
sub_dict.update(numbers) # Add all fields from "numbers"
sub_dict["name"] = main["name"] # Add "name" field
out.append(sub_dict) # append sud dict to list outputs
print(out)
# [{'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'},
# {'masterName': 'AAAAAAAAAAA', 'date': '2019-05-16', 'NumberOne': 222, 'name': 'randomka'},
# {'masterName': 'AAAAAAAAAAA', 'date': '2019-07-01', 'NumberOne': 341, 'name': 'randomka'}]
df = pd.DataFrame().from_dict(out)
print(df)
# NumberOne date masterName name
# 0 222 2019-05-16 AAAAAAAAAAA randomka
# 1 222 2019-05-16 AAAAAAAAAAA randomka
# 2 341 2019-07-01 AAAAAAAAAAA randomka
。
希望有帮助!