我正在尝试将JSON转换为CSV文件,可用于进一步分析。我的结构存在的问题是,在转换JSON文件时,我有很多嵌套的字典/列表。
我尝试使用大熊猫json_normalize()
,但它只会使第一级扁平化。
import json
import pandas as pd
from pandas.io.json import json_normalize
from cs import CloudStack
api_key = xxxx
secret = xxxx
endpoint = xxxx
cs = CloudStack(endpoint=endpoint,
key=api_key,
secret=secret)
virtual_machines = cs.virtMach()
test = json_normalize(virtual_machines["virtualmachine"])
test.to_csv("test.csv", sep="|", index=False)
有什么主意如何讨好整个JSON文件,因此我可以为单个(在本例中为虚拟机)条目创建到CSV文件的单行输入?我已经尝试过这里发布的几种解决方案,但是我的结果始终是仅将第一级展平。
这是示例JSON(在这种情况下,我仍然将“ securitygroup”和“ nic”输出为JSON格式:
{
"count": 13,
"virtualmachine": [
{
"id": "1082e2ed-ff66-40b1-a41b-26061afd4a0b",
"name": "test-2",
"displayname": "test-2",
"securitygroup": [
{
"id": "9e649fbc-3e64-4395-9629-5e1215b34e58",
"name": "test",
"tags": []
}
],
"nic": [
{
"id": "79568b14-b377-4d4f-b024-87dc22492b8e",
"networkid": "05c0e278-7ab4-4a6d-aa9c-3158620b6471"
},
{
"id": "3d7f2818-1f19-46e7-aa98-956526c5b1ad",
"networkid": "b4648cfd-0795-43fc-9e50-6ee9ddefc5bd"
"traffictype": "Guest"
}
],
"hypervisor": "KVM",
"affinitygroup": [],
"isdynamicallyscalable": false
}
]
}
感谢您和最诚挚的问候, 博斯特让
答案 0 :(得分:9)
感谢gyx-hh,此问题已解决:
我使用了以下功能(可以在here中找到详细信息):
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
不幸的是,这完全使整个JSON扁平化,这意味着,如果您具有多级JSON(许多嵌套字典),则可能会将所有内容扁平化为带有成列的列的单行。
我最后使用的是json_normalize()
并指定了我需要的结构。可以通过here找到不错的示例。
希望这能帮助某人,并再次感谢gyx-hh解决方案。
最诚挚的问候
答案 1 :(得分:4)
IMO accepted answer无法正确处理JSON数组。
如果JSON对象具有数组作为值,则应将其展平为类似对象的数组
{'a': [1, 2]} -> [{'a': 1}, {'a': 2}]
而不是向键添加索引。
嵌套对象应通过串联键(例如,以点作为分隔符)来展平
{'a': {'b': 1}} -> {'a.b': 1}
(这在接受的人中正确完成了。)
在满足所有这些要求之后,我最终遵循以下要求(在 CPython3.5.3 中开发和使用):
from functools import (partial,
singledispatch)
from itertools import chain
from typing import (Dict,
List,
TypeVar)
Serializable = TypeVar('Serializable', None, int, bool, float, str,
dict, list, tuple)
Array = List[Serializable]
Object = Dict[str, Serializable]
def flatten(object_: Object,
*,
path_separator: str = '.') -> Array[Object]:
"""
Flattens given JSON object into list of objects with non-nested values.
>>> flatten({'a': 1})
[{'a': 1}]
>>> flatten({'a': [1, 2]})
[{'a': 1}, {'a': 2}]
>>> flatten({'a': {'b': None}})
[{'a.b': None}]
"""
keys = set(object_)
result = [dict(object_)]
while keys:
key = keys.pop()
new_result = []
for index, record in enumerate(result):
try:
value = record[key]
except KeyError:
new_result.append(record)
else:
if isinstance(value, dict):
del record[key]
new_value = flatten_nested_objects(
value,
prefix=key + path_separator,
path_separator=path_separator)
keys.update(new_value.keys())
new_result.append({**new_value, **record})
elif isinstance(value, list):
del record[key]
new_records = [
flatten_nested_objects(sub_value,
prefix=key + path_separator,
path_separator=path_separator)
for sub_value in value]
keys.update(chain.from_iterable(map(dict.keys,
new_records)))
new_result.extend({**new_record, **record}
for new_record in new_records)
else:
new_result.append(record)
result = new_result
return result
@singledispatch
def flatten_nested_objects(object_: Serializable,
*,
prefix: str = '',
path_separator: str) -> Object:
return {prefix[:-len(path_separator)]: object_}
@flatten_nested_objects.register(dict)
def _(object_: Object,
*,
prefix: str = '',
path_separator: str) -> Object:
result = dict(object_)
for key in list(result):
result.update(flatten_nested_objects(result.pop(key),
prefix=(prefix + key
+ path_separator),
path_separator=path_separator))
return result
@flatten_nested_objects.register(list)
def _(object_: Array,
*,
prefix: str = '',
path_separator: str) -> Object:
return {prefix[:-len(path_separator)]: list(map(partial(
flatten_nested_objects,
path_separator=path_separator),
object_))}
答案 2 :(得分:1)
从https://stackoverflow.com/a/62186053/4355695开始交叉发布(但需要进一步调整):在此仓库https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8中,我发现了list-inclusion comment by @roneo到answer posted by @Imran的实现。
我已经添加了检查以捕获空列表和空字典。并增加了打印线,这将有助于人们准确地了解此功能的工作原理。您可以通过设置crumbs=False
import collections
crumbs = True
def flatten(dictionary, parent_key=False, separator='.'):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
if crumbs: print('checking:',key)
new_key = str(parent_key) + separator + key if parent_key else key
if isinstance(value, collections.MutableMapping):
if crumbs: print(new_key,': dict found')
if not value.items():
if crumbs: print('Adding key-value pair:',new_key,None)
items.append((new_key,None))
else:
items.extend(flatten(value, new_key, separator).items())
elif isinstance(value, list):
if crumbs: print(new_key,': list found')
if len(value):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key).items())
else:
if crumbs: print('Adding key-value pair:',new_key,None)
items.append((new_key,None))
else:
if crumbs: print('Adding key-value pair:',new_key,value)
items.append((new_key, value))
return dict(items)
测试:
ans = flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3], 'e':{'f':[], 'g':{}} })
print('\nflattened:',ans)
输出:
checking: a
Adding key-value pair: a 1
checking: c
c : dict found
checking: a
Adding key-value pair: c.a 2
checking: b
c.b : dict found
checking: x
Adding key-value pair: c.b.x 5
checking: y
Adding key-value pair: c.b.y 10
checking: d
d : list found
checking: 0
Adding key-value pair: d.0 1
checking: 1
Adding key-value pair: d.1 2
checking: 2
Adding key-value pair: d.2 3
checking: e
e : dict found
checking: f
e.f : list found
Adding key-value pair: e.f None
checking: g
e.g : dict found
Adding key-value pair: e.g None
flattened: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3, 'e.f': None, 'e.g': None}
请完成我需要做的工作:我将任何复杂的json扔给我,并为我弄平了。我在原始代码中添加了一个检查,也可以处理空列表
贷记到https://github.com/ScriptSmith的仓库中,我在其中找到了初始化函数。
测试OP的示例json,以下是输出:
{'count': 13,
'virtualmachine.0.id': '1082e2ed-ff66-40b1-a41b-26061afd4a0b',
'virtualmachine.0.name': 'test-2',
'virtualmachine.0.displayname': 'test-2',
'virtualmachine.0.securitygroup.0.id': '9e649fbc-3e64-4395-9629-5e1215b34e58',
'virtualmachine.0.securitygroup.0.name': 'test',
'virtualmachine.0.securitygroup.0.tags': None,
'virtualmachine.0.nic.0.id': '79568b14-b377-4d4f-b024-87dc22492b8e',
'virtualmachine.0.nic.0.networkid': '05c0e278-7ab4-4a6d-aa9c-3158620b6471',
'virtualmachine.0.nic.1.id': '3d7f2818-1f19-46e7-aa98-956526c5b1ad',
'virtualmachine.0.nic.1.networkid': 'b4648cfd-0795-43fc-9e50-6ee9ddefc5bd',
'virtualmachine.0.nic.1.traffictype': 'Guest',
'virtualmachine.0.hypervisor': 'KVM',
'virtualmachine.0.affinitygroup': None,
'virtualmachine.0.isdynamicallyscalable': False}
因此,您将看到'tags'和'affinitygroup'键也已处理并添加到输出中。原始代码省略了它们。
答案 3 :(得分:1)
我尝试了 BFS 方法,只有当 val 是 dict 类型时,我才将 (parent,val) 存储在队列中。
def flattern_json(d):
if len(d) == 0:
return {}
from collections import deque
q = deque()
res = dict()
for key, val in d.items(): # This loop push the top most keys and values into queue.
if not isinstance(val, dict): # If it's not dict
if isinstance(val, list): # If it's list then check list values if it contains dict object.
temp = list() # Creating temp list for storing the values that we will need which are not dict.
for v in val:
if not isinstance(v, dict):
temp.append(v)
else:
q.append((key, v)) # if it's value is dict type then we push along with parent which is key.
if len(temp) > 0:
res[key] = temp
else:
res[key] = val
else:
q.append((key, val))
while q:
k, v = q.popleft() # Taking parent and the value out of queue
for key, val in v.items():
new_parent = k + "_" + key # New parent will be old parent_currentval
if isinstance(val, list):
temp = list()
for v in val:
if not isinstance(v, dict):
temp.append(v)
else:
q.append((new_parent, v))
if len(temp) >= 0:
res[new_parent] = temp
elif not isinstance(val, dict):
res[new_parent] = val
else:
q.append((new_parent, val))
return res
它正在使用给定的 JSON,我附加了 _ 来展平 JSON,而不是使用 0 1 列表索引。
from pprint import pprint
print(pprint.pprint(flattern_json(d)))
它给出了以下输出:
{'count': 13,
'virtualmachine_affinitygroup': [],
'virtualmachine_displayname': 'test-2',
'virtualmachine_hypervisor': 'KVM',
'virtualmachine_id': '1082e2ed-ff66-40b1-a41b-26061afd4a0b',
'virtualmachine_isdynamicallyscalable': False,
'virtualmachine_name': 'test-2',
'virtualmachine_nic': [],
'virtualmachine_nic_id': '3d7f2818-1f19-46e7-aa98-956526c5b1ad',
'virtualmachine_nic_networkid': 'b4648cfd-0795-43fc-9e50-6ee9ddefc5bd',
'virtualmachine_nic_traffictype': 'Guest',
'virtualmachine_securitygroup': [],
'virtualmachine_securitygroup_id': '9e649fbc-3e64-4395-9629-5e1215b34e58',
'virtualmachine_securitygroup_name': 'test',
'virtualmachine_securitygroup_tags': []}
答案 4 :(得分:0)
只需在此处通过您的词典即可
def getKeyValuePair(dic,master_dic = {},master_key = None):
keys = list(dic.keys())
for key in keys:
if type(dic[key]) == dict:
getKeyValuePair(dic[key],master_dic = master_dic,master_key = key)
else:
if master_key == None:
master_dic[key] = dic[key]
else:
master_dic[str(master_key)+'_'+str(key)] = dic[key]
return master_dic
答案 5 :(得分:0)
我使用这个简单的函数将数据标准化并展平为json。 它接受列表,字典,元组并将其展平为json。
def normalize_data_to_json(raw_data: [list, dict, tuple], parent=""):
from datetime import datetime
from decimal import Decimal
result = {}
# key name normalise to snake case (single underscore)
parent = parent.lower().replace(" ", "_") if isinstance(parent, str) else parent
if isinstance(parent, str) and parent.startswith("__"):
# if parent has no parent remove double underscore and treat as int if digit else as str
# treating as int is better if passed data is a list so you output is index based dict
parent = int(parent.lstrip("_")) if parent.lstrip("_").isdigit() else parent.lstrip("_")
# handle str, int, float, and decimal.
# you can easily add more data types as er your data
if type(raw_data) in [str, int, float, Decimal]:
result[parent] = float(raw_data) if isinstance(raw_data, Decimal) else raw_data
# normalise datetime object
elif isinstance(raw_data, datetime):
result[parent] = raw_data.strftime("%Y-%m-%d %H:%M:%S")
# normalise dict and all nested dicts.
# all nests are joined with double underscore to identify parent key name with it's children
elif isinstance(raw_data, dict):
for k, v in raw_data.items():
k = f'{parent}__{k}' if parent else k
result.update(normalize_data_to_json(v, parent=k))
# normalise list and tuple
elif type(raw_data) in [list, tuple]:
for i, sub_item in enumerate(raw_data, start=1):
result.update(normalize_data_to_json(sub_item, f"{parent}__{i}"))
# any data which did not matched above data types, normalise them using it's __str__
else:
result[parent] = str(raw_data)
return result
答案 6 :(得分:0)
万一其他人发现自己正在寻找一种更适合后续程序处理的解决方案:
整理列表会导致需要处理列表长度等标题。我想要一个解决方案,如果有2个列表,例如2个元素,那么将生成四行,产生每个有效的潜在数据行(请参见下面的实际示例):
class MapFlattener:
def __init__(self):
self.headings = []
self.rows = []
def add_rows(self, headings, rows):
self.headings = [*self.headings, *headings]
if self.rows:
new_rows = []
for base_row in self.rows:
for row in rows:
new_rows.append([*base_row, *row])
self.rows = new_rows
else:
self.rows = rows
def __call__(self, mapping):
for heading, value in mapping.items():
if isinstance(value, Mapping):
sub_headings, sub_rows = MapFlattener()(value)
sub_headings = [f'{heading}:{sub_heading}' for sub_heading in sub_headings]
self.add_rows(sub_headings, sub_rows)
continue
if isinstance(value, list):
self.add_rows([heading], [[e] for e in value])
continue
self.add_rows([heading], [[value]])
return self.headings, self.rows
def map_flatten(mapping):
return MapFlattener()(mapping)
这将创建更多与关系数据一致的输出:
In [22]: map_flatten({'l': [1,2]})
Out[22]: (['l'], [[1], [2]])
In [23]: map_flatten({'l': [1,2], 'n': 7})
Out[23]: (['l', 'n'], [[1, 7], [2, 7]])
In [24]: map_flatten({'l': [1,2], 'n': 7, 'o': {'a': 1, 'b': 2}})
Out[24]: (['l', 'n', 'o:a', 'o:b'], [[1, 7, 1, 2], [2, 7, 1, 2]])
如果您在电子表格等中使用csv并且需要处理展平的数据,这将特别有用。
答案 7 :(得分:-1)
以jsonpath格式输出:
def convert(f):
out = {}
def flatten(x, name=None):
if type(x) is dict:
for a in x:
val = '.'.join((name, a)) if name else a
flatten(x[a], val)
elif type(x) is list:
for (i, a) in enumerate(x):
flatten(a, name + f'[{str(i)}]')
else:
out[name] = x if x else ""
flatten(f)
return out