DataFrame将json列表扩展为Many Rows

时间:2019-08-21 14:34:00

标签: python pandas

我想知道是否有一种简洁,pythonic的方式做到这一点

  phone
0 {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]}
1 {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}

我想将json字段扩展为数据字段,以获取此信息:

   type       id  name
0  android    1   a-1
1  android    2   a-2
2  iphone     3   i-1
3  iphone     4   i-2  


I have found a good solution:
def parser_expand_json(data):
    keys = []
    values = []
    for key in data:
        keys.append(key)
        values.append(data.get(key))

    return pd.Series(values, index=keys)

# that is it
def test():
    data = [{'brand': {'type': 'android'}, 'names': [{'id': '1', 'name': 'a-1'}, {'id': '2', 'name': 'a-2'}]},
            {'brand': {'type': 'iphone'}, 'names': [{'id': '3', 'name': 'i-1'}, {'id': '4', 'name': 'i-2'}]}]
    df = pd.DataFrame(data)

    # expand json list to N rows
    df = df.merge(df['names'].apply(pd.Series), right_index=True, left_index=True).drop('names', axis=1).melt(
        id_vars=['brand'], value_name='names').drop('variable', axis=1)

    """
                       brand                           names
    0  {u'type': u'android'}  {u'id': u'1', u'name': u'a-1'}
    1   {u'type': u'iphone'}  {u'id': u'3', u'name': u'i-1'}
    2  {u'type': u'android'}  {u'id': u'2', u'name': u'a-2'}
    3   {u'type': u'iphone'}  {u'id': u'4', u'name': u'i-2'}
    """
    print df

    # expand json key to columns name
    df = pd.concat([df, df['brand'].apply(parser_expand_json), df['names'].apply(parser_expand_json)], axis=1).drop(
        ['brand', 'names'], axis=1)

    """
          type id name
    0  android  1  a-1
    1   iphone  3  i-1
    2  android  2  a-2
    3   iphone  4  i-2
    """
    print df

1 个答案:

答案 0 :(得分:0)

使用列表手动构建具有所需结构的新DataFrame的解决方案:

import pandas as pd

json = [
  {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]},
  {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}
  ]

json_data = {'phone': json}

df_1 = pd.DataFrame(json_data)

type_list = []
id_list = []
name_list = []

for row in df_1.phone:
    for item in row['names']:
        type_list.append(row['brand']['type'])
        id_list.append(item['id'])
        name_list.append(item['name'])

data = {'type':type_list, 'id':id_list, 'name':name_list}

df_2 = pd.DataFrame(data)

要使用json_normalize(),我们必须首先将json重组为所需的列结构。这种情况下的解决方案如下:

import pandas as pd
from pandas.io.json import json_normalize

json = [
  {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]},
  {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}
  ]

json_mod = []
for row in json:
    for item in row['names']:
        json_mod.append({'type':row['brand']['type'],'id':item['id'],'name':item['name']})

df_3 = json_normalize(json_mod)

df_2df_3都显示为:

  id name     type
0  1  a-1  android
1  2  a-2  android
2  3  i-1   iphone
3  4  i-2   iphone