使用Python和Pandas解析嵌套JSON

时间:2018-01-20 12:13:15

标签: pandas

我想解析这个json的回复:

{
    "status": "ok",
    "results_time": "0.6756 sec.",
    "results_count": 1,
    "results": [
        {
            "date": "2017-01-01",
            "site_url": "asana.com",
            "site_title": "Use Asana to track your team’s work & manage projects · Asana",
            "site_description": "It’s free to use, simple to get started, and powerful enough to run your entire business. Sign up for free today.",
            "audience": {
                "visits": 19952871,
                "time_on_site_avg": "00:09:25",
                "page_views_avg": 6.9773123942789,
                "bounce_rate": 35.85
            },
            "traffic": {
                "value": 19952871,
                "percent": 100,
                "countries": [
                    {
                        "country": "United States",
                        "value": 6864349,
                        "percent": 34.4
                    },
                    {
                        "country": "United Kingdom",
                        "value": 1133338,
                        "percent": 5.68
                    },
                    {
                        "country": "Brazil",
                        "value": 705693,
                        "percent": 3.54
                    },
                    {
                        "country": "Canada",
                        "value": 703566,
                        "percent": 3.53
                    },
                    {
                        "country": "Poland",
                        "value": 700182,
                        "percent": 3.51
                    },
                    {
                        "country": "Other",
                        "value": 984474655,
                        "percent": 49.34
                    }
                ],

.........     }

我想用这些字段导出csv:

audience.visits
audience.time_on_site_avg
audience.page_views_avg
audience.bounce_rate
traffic.countries.country
traffic.countries.value
traffic.countries.percent

我有这些代码但没有成功。

import json 
import pandas as pd 
from pandas.io.json import json_normalize 

with open('dict.competitor') as f:
    d = json.load(f)

traffic1 =j son_normalize(data=d['results'],record_path='traffic','countries'])

print(traffic1)

我觉得我在那里。我已经尝试了其他SO帖子的几种组合和建议来获取剩余的数据。到目前为止,没有任何工作。我知道我遇到的问题是由于嵌套,只需要找到一种方法来获得所需的结果。感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

这可能是一个更聪明的熊猫方式(这将是很好看的),但这是一个应该产生所需结果的循环。

根据我对您的问题的理解,这将生成一个如下所示的DataFrame,并可以导出为CSV:

enter image description here

data = []

for r in the_json["results"]:

    for d in r["traffic"]["countries"]:
        row = {}
        for key in d.keys():
            row["traffic.{}".format(key)] = d[key]

        for key in r["audience"]:
            row["audience.{}".format(key)] = r["audience"][key]

        data.append(row)

df = pd.DataFrame(data)

df.to_csv("filename.csv")