使用Pandas从URL读取嵌套的JSON

时间:2019-06-18 03:24:53

标签: python json pandas jupyter-notebook

我知道也有类似的问题,但是似乎没有一个问题可以帮助我。我正在尝试仅使用“数据”中的信息创建一个DataFrame

我的JSON文件如下(complete file)

{
"data": [
    {
    "ID Education Level": 1,
    "Education Level": "Enseñanza Básica",
    "ID Year": 2017,
    "Year": "2017",
    "ID Region": 8,
    "Region": "Biobío",
    "ID Comuna": 298,
    "Comuna": "San Pedro De La Paz",
    "Abandonment Percentage": 0.006858621805241022
    },
    {
    "ID Education Level": 2,
    "Education Level": "Enseñanza Media",
    "ID Year": 2017,
    "Year": "2017",
    "ID Region": 8,
    "Region": "Biobío",
    "ID Comuna": 298,
    "Comuna": "San Pedro De La Paz",
    "Abandonment Percentage": 0.01564914992272025
    },
    {
    "ID Education Level": 1,
    "Education Level": "Enseñanza Básica",
    "ID Year": 2016,
    "Year": "2016",
    "ID Region": 8,
    "Region": "Biobío",
    "ID Comuna": 298,
    "Comuna": "San Pedro De La Paz",
    "Abandonment Percentage": 0.006825490582135591
    }
],
"source": [
    {
    "measures": [
        "Abandonment Percentage"
    ],
    "annotations": {
        "source_name": "Creciendo con Derechos - Ministerior de Desarrollo Social",
        "source_description": "Sistema de indicadores para el seguimiento de los derechos de niños, niñas y adolescentes, en relación a sus condiciones de vida y en sintonía con la Convención sobre los Derechos del Niño.",
        "source_link": "http://www.creciendoconderechos.gob.cl/indicadores",
        "dataset_name": "mds_abandonment_rate",
        "dataset_link": "https://github.com/datachile/datachile-etl/tree/master/childhood/mds_abandonment_rate",
        "topic": "childhood",
        "subtopic": "abandonment_rate",
        "available_dimensions": "",
        "available_measures": ""
    },
    "name": "mds_abandonment_rate",
    "substitutions": []
    }
]
}

这是我想要完成的DataFrame。

expected dataframe result

我已经阅读了read_json文档,并看到了一些解决方案,这些解决方案可能对我尝试执行的操作来说太复杂了。另外,我还需要使用类似于JSON的一系列网址,因此手动操作并不是一种选择。

谢谢您的答复,第一次来这里请原谅我可怜的英语。

2 个答案:

答案 0 :(得分:0)

这是在图像中生成所需输出的代码

import pandas as pd
import json
with open('data.json') as json_file:
data = json.load(json_file)

df = pd.DataFrame(data['data'])
df.to_csv("output4_9.csv", encoding='utf-8', index='false')

csv包含它的输出。

以下是将url转换为.json文件的代码:         urllib.request导入urlopen         导入json         导入请求

    url = "https://es.datachile.io/api/data?measures=Abandonment%20Percentage&drilldowns=Education%20Level,Year&parents=true&Comuna=298"
    response = urlopen(url)
    data = json.loads(response.read())

    with open('data1.json', 'w') as fw:
        json.dump(data, fw)

答案 1 :(得分:0)

如果您想从某些url中读取json,则从请求库中读取url并将其解析为json到变量'data'

import requests
link = 'https://es.datachile.io/api/data?measures=Abandonment%20Percentage&drilldowns=Education%20Level,Year&parents=true&Comuna=298'
resp = requests.get(url=link)
if resp.status_code == 200:
    data = resp.json()

上面提到的其余解决方案。我希望这会有所帮助。