我的MongoDB数据库中有一个集合,每个记录都代表一条边(我正在构建的应用程序中的一条路)。每个记录具有以下形式,其中第一个id
是边的id
:
{
"_id":{
"$oid":"5d0e7acc9c0bd9917006dd56"
},
"edge":{
"@id":":3659704519_0",
"@traveltime":"2.37",
"@timestep":"3",
"lane":[
{
"@id":":3330548807_1_0",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
},
{
"@id":":3330548807_1_1",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
}
]
}
}
我想对这些数据进行一些分析,并希望将记录转换为大熊猫中的数据框。 所需的数据帧框架如下所示:
the desirable skeleton for the data frame
我尝试使用pandas.io.json.json_normalize(d)
进行规范化,但无法获得所需的输出。
如我们所见,我有一组通道,最多可以包含两个通道。它也只能包含一个车道。因此,我想将通道分成数据帧的两行。
有人可以为我提出解决方案吗?
答案 0 :(得分:0)
如果数据像您一样嵌套,则必须先将其转换为平面形状,然后才能创建数据框。
import pandas
json = [
{
"_id":{
"$oid":"5d0e7acc9c0bd9917006dd56"
},
"edge":{
"@id":":3659704519_0",
"@traveltime":"2.37",
"@timestep":"3",
"lane": [
{
"@id":":3330548807_1_0",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
},
{
"@id":":3330548807_1_1",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
}
]
}
},
{
"_id":{
"$oid":"5d0e7acc9c0bd9917006dd56"
},
"edge":{
"@id":":3659704519_0",
"@traveltime":"2.37",
"@timestep":"3",
"lane":{
"@id":":3330548807_1_0",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
}
}
},
]
def ensure_list(obj):
if isinstance(obj, list):
return obj
else:
return [obj]
json_transformed = [
{
# edge attributes
'edge_id': record['edge']['@id'],
# lane attributes
'lane_id': lane['@id'],
# ...
}
for record in json
for lane in ensure_list(record['edge']['lane'])
]
df = pandas.DataFrame(json_transformed)