Question

我从API中提取输出如下所示（尝试尽可能地格式化）：

{
    "other":{
                Not important.. (ignored later)
            },
    "resultList":[
        {
            "date": "2017-10-26T21:52:59.840Z",
            "uniqueId": "c0a9c665-0f6f-c8",
            "children":[
                {
                    "identifier": "FAMR@316069707@3160697070",
                    "score": 1,
                    "parentId": "c0a9c665-0f6f-4fc8"
                },
                {
                    Same format as first child...
                },
                {
                    Same format as first child...
                }
            ],
            "weights":[
                60,
                20,
                20
            ],
            "type": "ABC"
        },
        {
            Same format as first dictionary…
        }
    ]
}

根据对stackoverflow的搜索，我通过提取json来解决它，仅为resultList（这是我唯一关心的部分）规范化其输出，然后按列定向并转换为熊猫DataFrame。这是代码：

import requests
import pandas as pd 
from pandas.io.json import json_normalize

# Get JSON from API
user = str(input("Enter User Name: ")) 
password = getpass.getpass("Enter Password: ") 
url = 'https://API_url'
req = requests.post(url = url, auth=(user, password))
out = req.json()

# Create normalized dataframe from API
solr_df = pd.DataFrame.from_dict(json_normalize(out["resultList"]), orient='columns')

但是，虽然这会将resultList展平为列，但children列仍会嵌套为词典列表（实际上附加了u，我不想要）并且weights列仍然是列表..

你可以帮助重组这个以返回一个结果，其中儿童和重量被压扁为列？

提前谢谢！

Answer 1

无法想到一种更有效的方法来做到这一点，虽然我确信存在。

循环浏览json对象并手动压平数据。

dfAll = pd.DataFrame()
for record in r['resultList']:

    conc = []
    otherFields = {}

    for field in record:

        if isinstance(record[field], list):
            if len(record[field]) > 0:
                if isinstance(record[field][0], dict):
                    conc.append(pd.DataFrame(record[field]))

                else:
                    conc.append(pd.DataFrame(record[field],columns=[field]))

        else:
            otherFields[field] = record[field]


    df = pd.concat(conc,axis=1)

    for field in otherFields:
        df[field] = otherFields[field]

    dfAll = dfAll.append(df)

dfAll


   weights                 identifier            parentId  score  \
0       60  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   
1       20  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   
2       20  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   
0       10  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   
1       20  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   
2       30  FAMR@316069707@3160697070  c0a9c665-0f6f-4fc8      1   

                       date type          uniqueId  
0  2017-10-26T21:52:59.840Z  ABC  c0a9c665-0f6f-c8  
1  2017-10-26T21:52:59.840Z  ABC  c0a9c665-0f6f-c8  
2  2017-10-26T21:52:59.840Z  ABC  c0a9c665-0f6f-c8  
0  2015-10-26T21:52:59.840Z  ABC               123  
1  2015-10-26T21:52:59.840Z  ABC               123  
2  2015-10-26T21:52:59.840Z  ABC               123

使用多种列类型（一些嵌套）展平嵌套json

1 个答案: