将字典中的嵌套列表展平到csv

时间:2018-12-18 01:41:12

标签: python json api

我正在研究如何扁平化字典中的嵌套列表,该字典使用python嵌套在列表中。

以下示例:

[
 {
    "id": 8,
    "category": {
        "id": 0,
        "name": "lion"
    },
    "name": "Leon",
    "photoUrls": [
        "123",
        "444",
    ],
    "tags": [
        {
            "id": 1,
            "name": "TagLion"
        },
        {
            "id": 2,
            "name": "KingOfTheJungle"
        }
    ],
},

{
    "id": 83,
    "category": {
        "id": 0,
        "name": "dog UPDATED"
    },
    "name": "Buff",
    "photoUrls": [
        "333",
    ],
    "tags": [
        {
            "id": 1,
            "name": "TagNumber1UPDATED"
        },
        {
            "id": 2,
            "name": "DogWithStickUPDATED"
        }
    ],
}
]

从上面的示例(这是API的返回),我想将输出写入csv。但是这里的捕获是在“标签” 上,它是一个嵌套列表。我希望将上述结果展平为csv格式,如下所示:

id | category                       | name | photoUrls     | tags
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 1, "name": "TagLion"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 1,"name": "TagNumber1UPDATED"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 2,"name": "name": "DogWithStickUPDATED"}

如何使用python做到这一点?希望将其设置为配置,并在加载到csv时,python将查找此配置以展平数组“ tags”

编辑: 还要使 photourls 列变平,它是一个数组。如下所示,通过管道化而不是拆分来实现。

id | category                       | name | photoUrls     | tags
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123 |444      | {'id': 1, "name": "TagLion"}
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 1, "name": "TagLion"}
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 2, "name": "KingOfTheJungle"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 1,"name": "TagNumber1UPDATED"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 2,"name": "name": "DogWithStickUPDATED"}

2 个答案:

答案 0 :(得分:1)

您可以使用神奇的pandas软件包的力量:

代码:

import pandas as pd

data = [] # your list is here

df = pd.DataFrame(data)

# expand 'tags' column into multiple rows
tags = df.apply(lambda x: pd.Series(x['tags']), axis=1).stack().reset_index(level=1, drop=True)
tags.name = 'tags'
df = df.drop('tags', axis=1).join(tags)

print(df)

打印:

                           category  id  name photoUrls                                      tags
0         {'id': 0, 'name': 'lion'}   8  Leon     [123]              {'id': 1, 'name': 'TagLion'}
0         {'id': 0, 'name': 'lion'}   8  Leon     [123]      {'id': 2, 'name': 'KingOfTheJungle'}
1  {'id': 0, 'name': 'dog UPDATED'}  83  Buff     [333]    {'id': 1, 'name': 'TagNumber1UPDATED'}
1  {'id': 0, 'name': 'dog UPDATED'}  83  Buff     [333]  {'id': 2, 'name': 'DogWithStickUPDATED'}

要转储为CSV,可以使用.to_csv() method


您还可以将“扩展列”逻辑提取到单独的方法中并重复使用:

def expand_column(df, column_name):
    c = df.apply(lambda x: pd.Series(x[column_name]), axis=1).stack().reset_index(level=1, drop=True)
    c.name = column_name
    return df.drop(column_name, axis=1).join(c)

用法:

df = pd.DataFrame(data)
df = expand_column(df, 'tags')

答案 1 :(得分:1)

您可以使用嵌套的理解:

import csv
d = [{'id': 8, 'category': {'id': 0, 'name': 'lion'}, 'name': 'Leon', 'photoUrls': ['123'], 'tags': [{'id': 1, 'name': 'TagLion'}, {'id': 2, 'name': 'KingOfTheJungle'}]}, {'id': 83, 'category': {'id': 0, 'name': 'dog UPDATED'}, 'name': 'Buff', 'photoUrls': ['333'], 'tags': [{'id': 1, 'name': 'TagNumber1UPDATED'}, {'id': 2, 'name': 'DogWithStickUPDATED'}]}]
new_d = [[i['id'], i['category'], i['name'], *i["photoUrls"], c] for i in d for c in i['tags']]
with open('results.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows([['id', 'category', 'name', 'photoUrls', 'tags'], *new_d])

输出:

id,category,name,photoUrls,tags
8,"{'id': 0, 'name': 'lion'}",Leon,123,"{'id': 1, 'name': 'TagLion'}"
8,"{'id': 0, 'name': 'lion'}",Leon,123,"{'id': 2, 'name': 'KingOfTheJungle'}"
83,"{'id': 0, 'name': 'dog UPDATED'}",Buff,333,"{'id': 1, 'name': 'TagNumber1UPDATED'}"
83,"{'id': 0, 'name': 'dog UPDATED'}",Buff,333,"{'id': 2, 'name': 'DogWithStickUPDATED'}"