使用Python处理大型JSON

时间:2018-10-09 20:40:45

标签: python json pandas dictionary

我有一个相当大的.json格式的数据文件,我想对其进行处理,其格式如下,就像许多json对象在一起:

[
{ 
    "_id" : "...", 
    "idSession" : "...", 
    "createdAt" : "1526894989268", 
    "status" : "COMPLETE", 
    "raw" : "Bobsguide,Marketing Assistant,Sales / Marketing79642,Baitshepi,,etc", 
    "updatedAt" : "...", 
    "graphResults" : [

        [
            "lastName", 
            "stock"
        ], 
        [
            "country", 
            "Botswana"
        ], 
        [
            "location", 
            "Botswana  "
        ], 
        [
            "city", 
            "-"
        ], 
        [
            "state", 
            "-"
        ], 
        [
            "school", 
            "Heriot-Watt University"
        ], 
        [
            "skills", 
            "Budgeting,Business Process Improvement,Business Planning"
        ], 

    ], 

    "eid" : {
        "###" : "12020653-1889-35be-8009-b1c9d43768ac"
    }
}

{ 
    "_id" : "...", 
    "idSession" : "...", 
    "createdAt" : "1526894989268", 
    "status" : "COMPLETE", 
    "raw" : "Bobsguide,79619,Steven,example,steven.jones@example.com,Marketing Assistant,Sales,,etc", 
    "updatedAt" : "...", 
    "graphResults" : [
        [
            "country", 
            "United Kingdom"
        ], 
        [
            "location", 
            "United Kingdom London London"
        ], 
        [
            "city", 
            "London"
        ], 
        [
            "state", 
            "London"
        ], 
        [
            "skills", 
            "Solvency II,Liquidity Risk,Screening,etc"
        ]
    ], 

    "eid" : {
        "###" : "..."
    }
}

...



]

有没有一种直接的方法可以将其读入python脚本进行操作/分析。感兴趣的主要部分在图形结果和原始标签下。我没有这种原始数据形式的经验,因此非常感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

首先,您发布的数据不正确,应该类似于以下内容,并且要访问您提到的元素,可以尝试以下内容

{
    "test":[
    { 
        "_id" : "...", 
        "idSession" : "...", 
        "createdAt" : "1526894989268", 
        "status" : "COMPLETE", 
        "raw" : "Bobsguide,Marketing Assistant,Sales /             Marketing79642,Baitshepi,,etc", 
        "updatedAt" : "...", 
        "graphResults" : [

            [
                "lastName", 
                "stock"
            ], 
            [
                "country", 
                "Botswana"
            ], 
            [
                "location", 
                "Botswana  "
            ], 
            [
                "city", 
                "-"
            ], 
            [
                "state", 
                "-"
            ], 
            [
                "school", 
                "Heriot-Watt University"
            ], 
            [
                "skills", 
                "Budgeting,Business Process Improvement,Business Planning"
            ]
        ], 
        "eid" : {
            "###" : "12020653-1889-35be-8009-b1c9d43768ac"
        }
        },
        { 
            "_id" : "...", 
            "idSession" : "...", 
            "createdAt" : "1526894989268", 
            "status" : "COMPLETE", 
            "raw" : "Bobsguide,79619,Steven,example,steven.jones@example.com,Marketing     Assistant,Sales,,etc", 
            "updatedAt" : "...", 
            "graphResults" : [
                [
                    "country", 
                    "United Kingdom"
                ], 
                [
                    "location", 
                    "United Kingdom London London"
                ], 
                [
                    "city", 
                    "London"
                ], 
                [
                    "state", 
                    "London"
                ], 
                [
                    "skills", 
                    "Solvency II,Liquidity Risk,Screening,etc"
                ]
            ], 

            "eid" : {
                "###" : "..."
            }
        }
    ]
}

//答案

import json

data_file = open('data.json', 'r')
information = json.load(data_file) // this will give you a json obj

print(information['test'][1]['raw']) // would pick element 1 from array then 

在原始键中选择并打印值

print(information['test'][1]['graphResults']) // would pick element 1 from array then pick and print value in raw key