Python Pandas-拼合嵌套JSON

时间:2018-11-21 21:42:30

标签: python json pandas geopandas

使用我试图转换为Pandas数据框的嵌套JSON数据。 json_normalize函数提供了一种完成此操作的方法。

{
"locations" : [ {
  "timestampMs" : "1542654",
  "latitudeE7" : 3777321,
  "longitudeE7" : -122423125,
  "accuracy" : 17,
  "altitude" : -10,
  "verticalAccuracy" : 2,
  "activity" : [ {
  "timestampMs" : "1542652",
  "activity" : [ {
    "type" : "STILL",
    "confidence" : 100
   } ]
  }]
 }]
}

我利用该功能对位置进行了归一化,但是,嵌套部分“活动”不是平坦的。

这是我的尝试:

activity_data = json_normalize(d, 'locations', ['activity','type', 'confidence'], 
                               meta_prefix='Prefix.',
                               errors='ignore') 

DataFrame:

[{u'activity': [{u'confidence': 100, u'type': ...   -10.0   NaN 377777377   -1224229340 1542652023196   

“活动”列仍具有嵌套元素,我需要在其自己的列中对其进行拆包。

任何建议/提示将不胜感激。

1 个答案:

答案 0 :(得分:0)

使用递归来展平嵌套的dicts

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

数据:

  • 要创建数据集,我使用了给定的数据。
  • datajson
data = {'locations': [{'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2},
                      {'accuracy': 17,'activity': [{'activity': [{'confidence': 100,'type': 'STILL'}],'timestampMs': '1542652'}],'altitude': -10,'latitudeE7': 3777321,'longitudeE7': -122423125,'timestampMs': '1542654','verticalAccuracy': 2}]}

使用flatten_json

df = pd.DataFrame([flatten_json(x) for x in data['locations']])

输出:

 accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2