Question

我正在尝试规范化如下所示的json文件（一小段）：

[{'trimestre': 'A2000',
  'cours': [{"sigle":"TECH 20701", "titre":"La cybersécurité et le gestionnaire",'etudiants': [{'matricule': '22000803',
      'nom': 'Boyer,AndrÃ©',
      'note': 'C+',
      'valeur': 2.3},
     {'matricule': '22000829',
      'nom': 'Keighan,Maylis',
      'note': 'A+',
      'valeur': 4.3},
     {'matricule': '22000869',
      'nom': 'Lahaie,Lyes',
      'note': 'B+',
      'valeur': 3.3},
     {'matricule': '22000973',
      'nom': 'Conerardy,Rawaa',
      'note': 'B+',
      'valeur': 3.3},
      ]}]

我正在尝试获取一个如下所示的表：

                                    **"trimestre"** (columns)
      **"sigle" + "titre"** (index): *valeur*

import pandas as pd
import json
import numpy as np
from pandas.io.json import json_normalize

data = pd.read_json('DataTP2.json')
print(data)

我尝试使用像这样的规范化功能

result = json_normalize(data, 'cours',['trimestre'])
print(result)

但是我遇到一个错误：TypeError：字符串索引必须是整数

基本上，我希望“ sigle” +“ titre”（来自“ cours”）作为索引，希望“ trimestre”作为列，而“ valeur”的平均值作为表中的值。

谢谢！

Answer 1

您在这里：

from collections import defaultdict
import json

with open("data.json", "r") as f:
    data = json.load(f)

test = [{'trimestre': 'A2000',
  'cours': [{"sigle":"TECH 20701", "titre":"La cybersécurité et le gestionnaire",'etudiants': [{'matricule': '22000803',
      'nom': 'Boyer,AndrÃ©',
      'note': 'C+',
      'valeur': 2.3},
     {'matricule': '22000829',
      'nom': 'Keighan,Maylis',
      'note': 'A+',
      'valeur': 4.3},
     {'matricule': '22000869',
      'nom': 'Lahaie,Lyes',
      'note': 'B+',
      'valeur': 3.3},
     {'matricule': '22000973',
      'nom': 'Conerardy,Rawaa',
      'note': 'B+',
      'valeur': 3.3},
      ]}]}]


results = defaultdict(list)

for trimestre in data:
    results["trimestre"].append(trimestre["trimestre"])
    for cours in trimestre["cours"]:
        results["index"].append(f"{cours['sigle']} {cours['titre']}")
        results["valeur"].append(cours["sigle"])

df = pd.DataFrame(results["valeur"], columns=results["trimestre"], index=results["index"])

结果

>>> print(df)
                                                   A2000
TECH 20701 La cybersécurité et le gestionnaire  TECH 20701

标准化熊猫中深层嵌套的json

1 个答案: