将JSON导入数据框并进行规范化

时间:2018-10-19 18:57:12

标签: python json pandas dataframe

我有以下json文档,我想将其导入到数据框中:

{
"agents": [
    {
        "core_build": "17",
        "core_version": "7.1.1",
        "distro": "win-x86-64",
        "groups": [
            {
                "id": 101819,
                "name": "O Laptops"
            }
        ],
        "id": 2198802,
        "ip": "x.x.x.x",
        "last_connect": 1539962159,
        "last_scanned": 1539373347,
        "linked_on": 1534964847,
        "name": "x1x1x1x1",
        "platform": "WINDOWS",
        "plugin_feed_id": "201810182051",
        "status": "on",
        "uuid": "ca8b941a-80cd-4c1c-8044-760e69781eb7"
    },
    {
        "core_build": "17",
        "core_version": "7.1.1",
        "distro": "win-x86-64",
        "groups": [
            {
                "id": 101839,
                "name": "G Personal"
            },
            {
                "id": 102037,
                "name": "W6"
            },
            {
                "id": 102049,
                "name": "MS8"
            }
        ],
        "id": 2097601,
        "ip": "x.x.x.x",
        "last_connect": 1539962304,
        "last_scanned": 1539437865,
        "linked_on": 1529677890,
        "name": "x2xx2x2x2",
        "platform": "WINDOWS",
        "plugin_feed_id": "201810181351",
        "status": "on",
        "uuid": "7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2"
    },
    {
        "core_build": "14",
        "core_version": "7.1.0",
        "distro": "win-x86-64",
        "id": 2234103,
        "ip": "x6x6x6x6x",
        "last_connect": 1537384290,
        "linked_on": 1537384247,
        "name": "x7x7x7x",
        "platform": "WINDOWS",
        "status": "off",
        "uuid": "0696ee38-402a-4866-b753-2816482dfce6"
    }],
"pagination": {
    "limit": 5000,
    "offset": 0,
    "sort": [
        {
            "name": "name",
            "order": "asc"
        }
    ],
    "total": 14416
 }
}

出于相同的目的,我编写了以下代码:

import json
from pandas.io.json import json_normalize

with open('out.json') as f:
    data = json.load(f)

df = json_normalize(data, 'agents', [['groups', 'name']], errors='ignore')
print(df)

这将按原样解压缩“ agents”(以及“ groups”字段为多值字段)中的所有字段以及一个名为“ groups.name”的新字段,该字段为空(所有值均为NaN)

我只希望将“ agents”字段中的字段解包到数据框中,而将“ groups”字段中的字段解包成单独的列(“ core_build”,“ core_version”,“ distro”,“ groups.name” ','id','ip','last_connect','last_scanned','linked_on','name','platform','plugin_feed_id','status','uuid')。

我该如何实现?

编辑: 执行以下操作

df = json_normalize(pd.concat([pd.DataFrame(i) for i in data['agents']]).to_dict('r'))

返回错误 ValueError:如果使用所有标量值,则必须传递索引

1 个答案:

答案 0 :(得分:0)

您可以将pd.concat()用于列表理解:

df = pd.concat([pd.DataFrame(i) for i in my_json['agents']])

或者如果您想将类型为group的{​​{1}}列解包以分隔各列,请尝试以下操作:

dict

收益:

df = json_normalize(pd.concat([pd.DataFrame(i) for i in my_json['agents']]).to_dict('r'))