从MongoDB到Pandas数据框

时间:2019-06-23 16:41:52

标签: python json mongodb pandas dataframe

我的MongoDB数据库中有一个集合,每个记录都代表一条边(我正在构建的应用程序中的一条路)。每个记录具有以下形式,其中第一个id是边的id

{  
   "_id":{  
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{  
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane":[  
         {  
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
         },
         {  
            "@id":":3330548807_1_1",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
         }
      ]
   }
}

我想对这些数据进行一些分析,并希望将记录转换为大熊猫中的数据框。 所需的数据帧框架如下所示:

the desirable skeleton for the data frame

我尝试使用pandas.io.json.json_normalize(d)进行规范化,但无法获得所需的输出。

如我们所见,我有一组通道,最多可以包含两个通道。它也只能包含一个车道。因此,我想将通道分成数据帧的两行。

有人可以为我提出解决方案吗?

1 个答案:

答案 0 :(得分:0)

如果数据像您一样嵌套,则必须先将其转换为平面形状,然后才能创建数据框。

import pandas

json = [
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane": [
         {
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
         },
         {
            "@id":":3330548807_1_1",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
         }
      ]
   }
},
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane":{
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
      }
   }
},
]

def ensure_list(obj):
    if isinstance(obj, list):
        return obj
    else:
        return [obj]

json_transformed = [
    {
        # edge attributes
        'edge_id': record['edge']['@id'],
        # lane attributes
        'lane_id': lane['@id'],
        # ...
    }
    for record in json
    for lane in ensure_list(record['edge']['lane'])
]

df = pandas.DataFrame(json_transformed)