我有一个特定的嵌套字典,我无法弄清楚如何将其展平为数据框。它涉及为“ d”和“ p”字典的5个值中的每一个创建列,并对“ d”字典中的两个字典应用“ $ oid”和“ gi”。
test = json.loads(j)
test
[{
"_id": {
"$oid": "5bab"
},
"d": [
{
"d": [
17317,
16556,
9680,
55982,
45948
],
"h": 74.65,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "9829563c95d0155f",
"t": 24.82,
"ti": "0000000000000000"
},
{
"d": [
17821,
17488,
9199,
56447,
44089
],
"h": 80.09,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "a42fbc88a44a316f",
"t": 25.1,
"ti": "0000000000000000"
}
],
"gi": "GW-P1007"
}
这是我尝试获取的格式:
_id$oid, d1, d2, d3, d4, d5, h, ts, p1, p2, p3, p4, p5, si, t, ti, gi
5babf3dab512dd0165efd36c, 17317, 16556, 9680,55982,45948, 74.65,2018-09-26T21:02:19+00:00, 61,76, 137, 152, 122, 9829563c95d0155f, 24.82, 0000000000000000
但是,json_normalize()不会使外部的d字典变平,而是将所有内容放在一列中。
答案 0 :(得分:1)
我不确定您想如何处理$oid
和gi
。我假设各列将相同,因此您可能需要对此进行调整。您可能需要调整零件以获取所需的列名,但这或多或少会为您提供所需的内容。
但是我所做的是将整个字典压平为1行数据帧,然后遍历各列以创建单独的行。
test =[{
"_id": {
"$oid": "5bab"
},
"d": [
{
"d": [
17317,
16556,
9680,
55982,
45948
],
"h": 74.65,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "9829563c95d0155f",
"t": 24.82,
"ti": "0000000000000000"
},
{
"d": [
17821,
17488,
9199,
56447,
44089
],
"h": 80.09,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "a42fbc88a44a316f",
"t": 25.1,
"ti": "0000000000000000"
}
],
"gi": "GW-P1007"
}]
import json
import pandas as pd
import re
jsonStr = json.dumps(test[0])
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
输出:
print (results.to_string())
d0 d1 d2 d3 d4 h ts p0 p1 p2 p3 p4 si t ti _id_$oid gi
0 17317.0 16556.0 9680.0 55982.0 45948.0 74.65 2018-09-26T21:02:19+00:00 61.0 76.0 137.0 152.0 122.0 9829563c95d0155f 24.82 0000000000000000 5bab GW-P1007
1 17821.0 17488.0 9199.0 56447.0 44089.0 80.09 2018-09-26T21:02:19+00:00 61.0 76.0 137.0 152.0 122.0 a42fbc88a44a316f 25.10 0000000000000000 5bab GW-P1007