将嵌套的JSON(Dict,List)展平到List中以准备写入DB

时间:2016-04-19 23:58:06

标签: python json list dictionary flatten

我仍然在解决嵌套JSON文件的问题。嵌套的项目是List或Dict:

这是我想要展平的文件(与我以前的帖子不同,我保持它的长度不错,但它只包含输入[0]而不是任何后续项目,因为它会很长):

input = [{'states': ['USED'], 'niceName': '1-series', 'id': 'BMW_1_Series',
            'years': [{'styles':
                       [{'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'},
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 100994560},
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'},
                          'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 100974974},
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 100974975},
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 100994561}
                        ],
                       'states': ['USED'], 'id': 100524709, 'year': 2008},
                      {'styles':
                       [{'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 101082656}, 
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 101082655},
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 101082663},
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 101082662}
                        ], 
                       'states': ['USED'], 'id': 100503222, 'year': 2009},
                      {'styles': 
                       [{'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 101200599},
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 101200600}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 101200607}, 
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 101200601}
                        ], 
                       'states': ['USED'], 'id': 100529091, 'year': 2010}, 
                      {'styles':
                       [{'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 101288165}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 101288166}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 101288298}, 
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 101288297}
                        ], 
                       'states': ['USED'], 'id': 100531309, 'year': 2011}, 
                      {'styles': 
                       [{'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 101381667}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 101381668}, 
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 101381665}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 101381666}
                        ], 
                       'states': ['USED'], 'id': 100534729, 'year': 2012}, 
                      {'styles': 
                       [{'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                        'name': '128i 2dr Coupe (3.0L 6cyl 6M)', 'id': 200428722},
                        {'trim': '128i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i 2dr Convertible (3.0L 6cyl 6M)', 'id': 200428721}, 
                        {'trim': '135is', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135is 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 200421701}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '135i 2dr Coupe (3.0L 6cyl Turbo 6M)', 'id': 200428724}, 
                        {'trim': '135i', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135i 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 200428723}, 
                        {'trim': '128i SULEV', 'states': ['USED'], 'submodel': {'body': 'Coupe', 'niceName': 'coupe', 'modelName': '1 Series Coupe'}, 
                         'name': '128i SULEV 2dr Coupe (3.0L 6cyl 6M)', 'id': 200428726}, 
                        {'trim': '128i SULEV', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '128i SULEV 2dr Convertible (3.0L 6cyl 6M)', 'id': 200428725}, 
                        {'trim': '135is', 'states': ['USED'], 'submodel': {'body': 'Convertible', 'niceName': 'convertible', 'modelName': '1 Series Convertible'}, 
                         'name': '135is 2dr Convertible (3.0L 6cyl Turbo 6M)', 'id': 200428727}
                        ], 
                       'states': ['USED'], 'id': 200421700, 'year': 2013}
                      ], 
          'name': '1 Series', 'make': {'niceName': 'bmw', 'name': 'BMW', 'id': 200000081}
          }, #here is more to come, but I needed to crop it
          ]

我失败了之后到目前为止使用的代码是由@poke编写的:Flattening Generic JSON List of Dicts or Lists in Python

def splitObj (obj, prefix = None):
    '''
    Split the object, returning a 3-tuple with the flat object, optionally
    followed by the key for the subobjects and a list of those subobjects.
    '''
    # copy the object, optionally add the prefix before each key
    new = obj.copy() if prefix is None else { '{}_{}'.format(prefix, k): v for k, v in obj.items() }

    # try to find the key holding the subobject or a list of subobjects
    for k, v in new.items():
        # list of subobjects
        if isinstance(v, list):
            del new[k]
            return new, k, v
        # or just one subobject
        elif isinstance(v, dict):
            del new[k]
            return new, k, [v]
    return new, None, None

def flatten (data, prefix = None):
    '''
    Flatten the data, optionally with each key prefixed.
    '''
    # iterate all items
    for item in data:
        # split the object
        flat, key, subs = splitObj(item, prefix)

        # just return fully flat objects
        if key is None:
            yield flat
            continue

        # otherwise recursively flatten the subobjects
        for sub in flatten(subs, key):
            sub.update(flat)
            yield sub

我收到以下错误:

AttributeError: 'str' object has no attribute 'items'

来自'states': ['USED']

的结果

我不知道如何处理。关键状态'可以保存为列表。

我希望有人可以帮助我。

Ps:这是来自Python: Write Nested JSON as multiple elements in List

的后续帖子

3 个答案:

答案 0 :(得分:0)

这是我对splitObj的解决方案

def splitObj (obj, prefix = None):
'''
Split the object, returning a 3-tuple with the flat object, optionally
followed by the key for the subobjects and a list of those subobjects.
obj needs to be a Dictonary
'''
# copy the object, optionally add the prefix before each key
new = obj.copy() if prefix is None or prefix=="NotFlat" else { '{}_{}'.format(prefix, k): v for k, v in obj.items() }

cL = 0
cD = 0
# try to find the key holding the subobject or a list of subobjects
for k, v in new.items():
    #Determine the number of lists in v
    if isinstance(v, list):
        cL += 1
    #Determine the number of dict in v
    elif isinstance(v, dict):
        cD += 1     
for k, v in new.items():
    # list of subobjects
    if isinstance(v, list):
        if (cD+cL) <=1:
            try:
                type(v[0])
            except IndexError:
                v = [""]
            if not isinstance(v[0], str):
                del new[k]
                return new, k, v
            elif isinstance(v[0], str):
                #handle list when only containing strings, return, the whole thing
                #solve other dicts which might be in the line
                #use "NotFlat" to run loop again but without adding a prefix

                new[k] = ", ".join(v)
                return new, None, None
            else:
                custLog.logger.info("")
        elif (cD+cL) >1:

            #print("Count List2 CD: "+str(cD))
            #print("Count LIST2 CL: "+str(cL))

            #if list is empty
            try:
                type(v[0])
            except IndexError:
                v = [""]

            if not isinstance(v[0], str):
                del new[k]
                for x in flatten([new]):
                    newOut = x
                    break
                return newOut, k, v
            elif isinstance(v[0], str):
                #handle list when only containing strings, return, the whole thing
                #solve other dicts which might be in the line
                #use "NotFlat" to run loop again but without adding a prefix
                new[k] = ", ".join(v)
                return None, "NotFlat", [new]
            else:
                custLog.logger.error("weder noch 2")

    # or just one subobject
    elif isinstance(v, dict):
        if (cD+cL) <=1:
            del new[k]
            return new, k, [v]
        elif (cD+cL) >1:
            del new[k]
            for x in flatten([new]):
                newOut = x
                break
            return newOut, k, [v]
return new, None, None

此处为flatten

def flatten (data, prefix = None):
'''
Flatten the data, optionally with each key prefixed.
'''
# iterate all items


for item in data:
    # split the object
    flat, key, subs = splitObj(item, prefix)
    if subs is None:
        if key is None:
            yield flat
            continue    
    # just return fully flat objects
    if key is None and flat is not None:
        yield flat
        continue

    # otherwise recursively flatten the subobjects
    try:
        for sub in flatten(subs, key):
            if flat is not None:
                sub.update(flat)
            yield sub
    except TypeError as e:
        custLog.logger.error("ERR: TypeError"+str(e))

答案 1 :(得分:0)

虽然不是一般化函数,但考虑遍历每个嵌套元素以获得用于数据库导入或flatfile(csv,txt)导出的平面输出。由于json文件由字典和列表的组合组成,因此在每个级别相应地处理它们:

<ul class="tab-group"><li class="tab active"><a href="#signup">3D</a></li><li class="tab"><a href="#login">REF</a></li></ul>

输出 (其中父项为每个孩子重复)

items = []
for outer in data:    
    inner = [''] * 15    
    for outerk, outerv in outer.items():        
        inner[0] = outer['states'][0]
        inner[1] = outer['niceName']
        inner[2] = outer['id']
        inner[3] = outer['make']['niceName']
        inner[4] = outer['make']['name']
        inner[5] = outer['make']['id']    
        if outerk == 'years':            
            for yri in outer[outerk]:                
                for yrk, yrv in yri.items():
                    inner[6] = yri['states'][0] 
                    inner[7] = yri['id'] 
                    inner[8] = yri['year'] 
                    if yrk == 'styles':
                        for stylei in yri[yrk]:
                            inner[9] = stylei['trim']
                            inner[10] = stylei['name']
                            inner[11] = stylei['id']
                            inner[12] = stylei['submodel']['body']
                            inner[13] = stylei['submodel']['niceName']
                            inner[14] = stylei['submodel']['modelName']

                            items.append(inner[0:14])

for i in items:        
    print(i)

答案 2 :(得分:0)

重新思考问题

为更普遍的问题找到解决方案通常更容易。所以,让我们先看看问题。

输入是一些描述一组对象的JSON文件。

对象被重新定义为原子(字符串或数字)或具有对象值的字典。列表用于表示备选方案(即列表的任何元素都可以代替列表)。 例如,{a:[1,2]}表示a可以是12

输出应该是不包含任何选项的对象列表。此外,对象应该是扁平的,即应该是其值为原子且其键描述原始对象中值的路径的dicts。

我的解决方案分别处理替代方案和扁平化。

正火

下面的函数normalise接受json.dumps的输入并产生一系列dicts。请注意,normalise的输入和输出具有相同的语义并描述相同的对象集。输出只是标准化,因为它确实包含仅在顶层的备选方案。数据库人员会称之为非规范化,因为它对于关系数据库是不可取的。

normalise始终返回一系列对象。 normalise被实现为生成器以保持较低的内存使用率。

normalise区分以下案例。

  • 原子输入意味着只有一种可能性。因此,原子被产生(这就像返回包含原子的列表一样)。
  • 清单是指替代品的替代品。它产生其规范化输入的所有元素(这就像连接列表一样)。
  • dict意味着我们必须考虑各个键的所有替代组合。因此,我们返回替代品的笛卡尔积。

以下是代码:

import itertools

def normalise(x):
    if isinstance(x, dict):
        keys = x.keys()
        values = (normalise(i) for i in x.values())
        for i in itertools.product(*values):
            yield (dict(zip(keys, i)))
    elif isinstance(x, list):
        #if not x:           # uncomment for "LEFT JOIN" behaviour
        #    yield None
        for i in x:
            yield from normalise(i)
    else:
        yield x

如果该代码包含任何空列表,则此代码不会返回该对象。这是因为没有可能的价值。这就像SQL&#34; INNER JOIN&#34;。从Bert的回答看起来他想要&#34; LEFT JOIN&#34;行为(即一些默认值)。为了实现这一点,只需取消注释这两行。

伪压扁

normalise产生的对象仍然具有原始(嵌套)dict结构。可以使用其他讨论中的代码来展平它们。

但是,OP希望将对象插入数据库中。因此,他很可能不需要扁平字典的键列表。他只需要一个返回给定路径值的函数。

这可以通过为具有__getitem__方法的dict创建包装器对象来实现。此包装器还可用于返回不存在路径的默认值。

class DictWrapper:
    def __init__(self, d, sep='.'):
        self.d = d
        self.sep = sep

    def __getitem__(self, key):
        ret = self.d
        try:
            for k in key.split(self.sep):
                ret = ret[k]
            return ret
        except KeyError:
            return None

sql插件可能看起来如下(使用psycopg2测试)

for i in normalise(input):
    cur.execute('insert into mytable (year) VALUES (%(years.year)s)', DictWrapper(i))

实施细节

  • 为了清晰起见,这个实现明显牺牲了一些运行时性能。

  • 可以使用抽象基类代替listdict。但是,这可能会有问题,因为str是一个序列,但应该被视为原子。

  • DictWrapper仅在任何dict键中未包含sep时才能正常工作。

  • normalise不会过滤掉重复项。这可以通过使用集合和命名元组而不是列表和dicts来完成。但是,这意味着整个结果必须在记忆中。最好在数据库级别过滤掉重复项。

  • 为了将内存使用量保持在最低限度,应该懒惰地阅读JSON。