在python中清理错误的json格式

时间:2018-06-20 15:10:08

标签: python json syntax

我已经收到一个包含一些数据的json文件,应该进行分析。数据来自sql数据库,因此通常在表中进行结构化。但是,当我收到它时,它看起来像这样:

{'TimeStamp1': '2018-06-03 00:21:04', 'Owner1': 'Some owner', 'Description1': 'A description', 'TimeStamp2': '2018-06-03 00:22:15', 'Owner2': 'A new Owner', 'Description2': 'A new description'}

...等等。因此,只有一条线/对象具有所有数据,而多个键具有几乎相同的名称。如何在Python中将其转换为类似于sql-setup或:

{'records':
   {'TimeStamp': '2018-06-03 00:21:04', 'Owner': 'Some owner', 'Description': 'A description'}, 
   {'TimeStamp': '2018-06-03 00:22:15', 'Owner': 'A new Owner', 'Description': 'A new description'}
}

并且仍然保证正确的所有者与相关的时间戳和说明在同一行吗? :)

1 个答案:

答案 0 :(得分:0)

这是一种简单的方法。可能可以对其进行优化,但是它应该做您想要的并且非常简单

def sanitize(d, keys):
    b = 0
    records = []

    #get  the highest numerical key
    for key in x.keys():
        cur_key_num = int(" ".join(re.findall("[1-9]+", key)))
        if cur_key_num > b:
            b = cur_key_num

    #go through key numbers 1 at a time
    for i in range(1, b+1):
        rec = {}

        #build a dictionary for each keynum
        for key in keys:
            rec[key] = d[key + str(i)]
        re cords.append(rec)

    return records

该函数的用法如下:

data = {'TimeStamp1': '2018-06-03 00:21:04', 'Owner1': 'Some owner', 'Description1': 'A description', 'TimeStamp2': '2018-06-03 00:22:15', 'Owner2': 'A new Owner', 'Description2': 'A new description'}
k = ['TimeStamp', 'Owner', 'Description']
r = sanitize(data, k)

并返回:

[{'Owner': 'Some owner', 'TimeStamp': '2018-06-03 00:21:04', 'Description': 'A description'}, {'Owner': 'A new Owner', 'TimeStamp': '2018-06-03 00:22:15', 'Description': 'A new description'}]