难以嵌套到Pandas Dataframe的字典

时间:2019-07-23 08:59:52

标签: python pandas dataframe dictionary nested

这是我讨厌的嵌套字典:


 "data": [

{
 'type': 'a',
 'id': '3',
 'attributes': {'name': 'Alexander',
  'address': 'Ree 25',
  'postalCode': '3019 VM',
  'place': 'Amsterdam',
  'company': 'Pizza BV',
  'phoneNumbers': [{'description': 'general', 'phoneNumber': '+31104136911'}],
  'locationCode': 'DURTM',
  'website': 'http://www.pizzabv.nl',
  'primaryEmail': 'info@pizzabv.nl',
  'secondaryEmail': '',
  'geoLocation': {'type': 'Point',
    'coordinates': [16.309702884655767, 31.879930329139634]
 }
},
 'relationships': [],
 'links': {'self': 'www.homepage.nl'
  }
},

{
 'type': 'b',
 'id': '7',
 'attributes': {'name': 'Sam',
  'address': 'Zee 15',
  'postalCode': '2019 AM',
  'place': 'Groningen',
  'company': 'Salami BV',
  'phoneNumbers': [{'description': 'specific', 'phoneNumber': '+31404136121'}],
  'locationCode': 'SWSTM',
  'website': 'http://www.salamibv.nl',
  'primaryEmail': 'info@salamibv.nl',
  'secondaryEmail': '',
  'geoLocation': {'type': 'Point',
   'coordinates': [18.309702884655767, 34.879930329139634]
 }
},
 'relationships': [],
 'links': {'self': 'www.homepage.nl'
 }
}
]

这就是我想要的数据框:


type | id | name | address | postalCode | ... | type | coordinates | relationships | links
...    ...   ...     ...        ...       ...    ...      ...            ...          ...

因此,不同的基础字典必须向上移动一层。首先,必须删除属性,并将基础值上移一层。

说明电话号码必须向上移动一层,然后才能删除电话号码

此外,有关类型 id 的所有信息都应放在一行中。

我不知道该怎么做。我尝试了以下几种方法:

terminals = pd.DataFrame.from_dict(data, orient='columns')
terminals.reset_index(level=0, inplace=True)
terminals.head()

但是,这给了我完整的熊猫数据框单元格中的字典。

我希望有人能帮我一点忙。

2 个答案:

答案 0 :(得分:0)

定义一个将字典解析为扁平化结构的函数,然后在将其传递给数据框构造函数之前应用该函数

def parse(dict_)
    di = dict_.copy()  # weak copy the dictionary so you don't modify the original dicts

    # bring attributes up a level
    di.update(di['attributes'])
    del di['attributes']

    # etc...
    return di

df = pd.DataFrame(map(parse, data))


答案 1 :(得分:0)

您必须删除数据。您可以使用以下递归函数来做到这一点:

def denest(x, parent=None, d=None):
    if d is None:
        d = {}
    for k, v in x.items():
        if isinstance(v, dict):
            denest(v, parent=(parent or []) + [k], d=d)
        elif isinstance(v, (list, tuple)):
            for j, item in enumerate(v):
                if isinstance(item, dict):
                    denest(item, parent=(parent or []) + [k, j], d=d)
                else:
                     d[tuple((parent or []) + [k, j])] = item
        else:
            d[tuple((parent or []) + [k])] = v
    return d

然后,假设data是词典列表,则可以简单地创建一个数据框,如下所示:

pd.DataFrame([denest(i) for i in data])
  (attributes, address) (attributes, company)  (attributes, geoLocation, coordinates, 0)  (attributes, geoLocation, coordinates, 1) (attributes, geoLocation, type) (attributes, locationCode) (attributes, name) (attributes, phoneNumbers, 0, description) (attributes, phoneNumbers, 0, phoneNumber) (attributes, place) (attributes, postalCode) (attributes, primaryEmail) (attributes, secondaryEmail)   (attributes, website) (id,)    (links, self) (type,)
0                Ree 25              Pizza BV                                  16.309703                                   31.87993                           Point                      DURTM          Alexander                                    general                               +31104136911           Amsterdam                  3019 VM            info@pizzabv.nl                                http://www.pizzabv.nl     3  www.homepage.nl       a
1                Zee 15             Salami BV                                  18.309703                                   34.87993                           Point                      SWSTM                Sam                                   specific                               +31404136121           Groningen                  2019 AM           info@salamibv.nl                               http://www.salamibv.nl     7  www.homepage.nl       b

如果愿意,您可以从此处重命名列和/或将它们转换为多索引数据框等。