如何在解析python中的json数据时解析多索引值并创建csv文件

时间:2019-05-31 05:51:50

标签: python json pandas dataframe

我有一个示例数据,如下。以下属性属于[data]字典。在“ XXXX”中,我的值是“ Naveen”,在“ YYYYY”中,我的值是“ Kumar”和“ Rajesh”。我正在尝试使用下面的代码来获取2条记录的输出

  {
  "data": [
  {
  "Empid": "1234",
  "Empname": "ABC",
   "data1": {
      "XXXX": [
        {
          "relative": {
            "id": "Naveen"
          }
        }
      ],
      "YYYYY": [
        {
          "relative": {
            "id": "Kumar"
          }
        },
        {
          "relative": {
            "id": "Rajesh"
          }
        }
      ]
      }

 }
 ]

}

请找到以下代码(我正在尝试)

df = pd.DataFrame()
for i in range(len(json_file['data'])):
temp = {}
temp['Empid'] = json_file['data'][i]['Empid']
temp['EmpName'] = json_file['data'][i]['EmpName']
    for key in json_file['data'][i]['data1'].keys():
            try:
                for j in range(len(json_file['data'][i]['data1'][key])):
                    temp[key]  = json_file['data'][i]['data1'][key][j]['relative']['id'] 
            except:
                temp[key] = None                    
    temp_df = pd.DataFrame([temp])
    df = pd.concat([df, temp_df], sort=True)

我想要实现的最终输出

 EmpID EmpName XXXX   YYYYY 
 1234  ABC     Naveen  Kumar
 1234  ABC     Nan     Rajesh

但是我只得到1条记录

EmpID EmpName XXXX   YYYYY 
1234  ABC     Naveen Rajesh

如有任何建议,请帮助我

1 个答案:

答案 0 :(得分:0)

一个修改代码的长解决方案,因此可以增加一个循环,更改索引以及修改range参数:

df = pd.DataFrame()

num = max([len(v) for k,v in json_file['data'][0]['data1'].items()])
for i in range(num):
    temp = {}
    temp['Empid'] = json_file['data'][0]['Empid']
    temp['Empname'] = json_file['data'][0]['Empname']
    for key in json_file['data'][0]['data1'].keys():
        if key not in temp:
            temp[key] = []
        try:
            for j in range(len(json_file['data'][0]['data1'][key])):
                temp[key].append(json_file['data'][0]['data1'][key][j]['relative']['id']) 
        except:
            temp[key] = None                    
    temp_df = pd.DataFrame([temp])
    df = pd.concat([df, temp_df],ignore_index=True)
for i in json_file['data'][0]['data1'].keys():
    df[i] = pd.Series([x for y in df[i].tolist() for x in y]).drop_duplicates()

现在:

print(df)

是:

  Empid Empname    XXXX   YYYYY
0  1234     ABC  Naveen   Kumar
1  1234     ABC     NaN  Rajesh