Question

我有以下json文档，我想将其导入到数据框中：

{
"agents": [
    {
        "core_build": "17",
        "core_version": "7.1.1",
        "distro": "win-x86-64",
        "groups": [
            {
                "id": 101819,
                "name": "O Laptops"
            }
        ],
        "id": 2198802,
        "ip": "x.x.x.x",
        "last_connect": 1539962159,
        "last_scanned": 1539373347,
        "linked_on": 1534964847,
        "name": "x1x1x1x1",
        "platform": "WINDOWS",
        "plugin_feed_id": "201810182051",
        "status": "on",
        "uuid": "ca8b941a-80cd-4c1c-8044-760e69781eb7"
    },
    {
        "core_build": "17",
        "core_version": "7.1.1",
        "distro": "win-x86-64",
        "groups": [
            {
                "id": 101839,
                "name": "G Personal"
            },
            {
                "id": 102037,
                "name": "W6"
            },
            {
                "id": 102049,
                "name": "MS8"
            }
        ],
        "id": 2097601,
        "ip": "x.x.x.x",
        "last_connect": 1539962304,
        "last_scanned": 1539437865,
        "linked_on": 1529677890,
        "name": "x2xx2x2x2",
        "platform": "WINDOWS",
        "plugin_feed_id": "201810181351",
        "status": "on",
        "uuid": "7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2"
    },
    {
        "core_build": "14",
        "core_version": "7.1.0",
        "distro": "win-x86-64",
        "id": 2234103,
        "ip": "x6x6x6x6x",
        "last_connect": 1537384290,
        "linked_on": 1537384247,
        "name": "x7x7x7x",
        "platform": "WINDOWS",
        "status": "off",
        "uuid": "0696ee38-402a-4866-b753-2816482dfce6"
    }],
"pagination": {
    "limit": 5000,
    "offset": 0,
    "sort": [
        {
            "name": "name",
            "order": "asc"
        }
    ],
    "total": 14416
 }
}

出于相同的目的，我编写了以下代码：

import json
from pandas.io.json import json_normalize

with open('out.json') as f:
    data = json.load(f)

df = json_normalize(data, 'agents', [['groups', 'name']], errors='ignore')
print(df)

这将按原样解压缩“ agents”（以及“ groups”字段为多值字段）中的所有字段以及一个名为“ groups.name”的新字段，该字段为空（所有值均为NaN）

我只希望将“ agents”字段中的字段解包到数据框中，而将“ groups”字段中的字段解包成单独的列（“ core_build”，“ core_version”，“ distro”，“ groups.name” '，'id'，'ip'，'last_connect'，'last_scanned'，'linked_on'，'name'，'platform'，'plugin_feed_id'，'status'，'uuid'）。

我该如何实现？

编辑：执行以下操作

df = json_normalize(pd.concat([pd.DataFrame(i) for i in data['agents']]).to_dict('r'))

返回错误 ValueError：如果使用所有标量值，则必须传递索引

Answer 1

您可以将pd.concat()用于列表理解：

df = pd.concat([pd.DataFrame(i) for i in my_json['agents']])

或者如果您想将类型为group的{{1}}列解包以分隔各列，请尝试以下操作：

dict

收益：

df = json_normalize(pd.concat([pd.DataFrame(i) for i in my_json['agents']]).to_dict('r'))

将JSON导入数据框并进行规范化

1 个答案: