将嵌套数组JSON文件中的数据提取到Dataframe中

时间:2017-11-09 13:49:48

标签: python pandas

拥有一个包含数组的数组的Json文件 我可以使用下面的代码获取所有“部分”,但无法找出json_normalize parms用法来提取嵌套数组中的不同级别?

即想从车辆数组中获取'id',其中'id'来自模型数组,其中包含所有部件数组

car | camry | "value":"engine","price":10.82

由于

f = open('sample.json')
data = json.load(f)
f.close()
df1 = json_normalize(data['vehicle'], 'model')
df2 = df1[['parts']]
ddf = pd.DataFrame(columns=['value','charge'])

for (index,row) in df2.iterrows():
    if pd.notnull(row[0]):
        e = row[0]
        ddf.loc[index] = [e[0]['value'], e[0]['charge']]


{
"vehicle":[
{
 "id":"car",
 "model":[
{
  "id":"camry",
"parts": [
{
"value":"engine",
"charge":10.82
}   ]    }
,
{
  "id":"avelon",
"parts": [
{
"value":"seats",
"charge":538.26
}    ]    }
,
{
  "id":"prius",

"parts": [
{
"value":"seats",
"charge":10.91
}    ]    }
,
{
  "id":"corolla",
  "markup": {
  "value":"61"
}
,
  "accessories": [
{
  "value":"vvvvv"
  }]

}    ]    }    ]    }

1 个答案:

答案 0 :(得分:1)

我认为你需要:

#remove NaNs
s = df1['parts'].dropna()
#create new DataFrame, assuming only one list always
df2 = pd.DataFrame(s.str[0].values.tolist(), index=s.index)
print (df2)
   charge   value
0   10.82  engine
1  538.26   seats
2   10.91   seats

#join to original
df = df1[['id']].join(df2)
print (df)
        id  charge   value
0    camry   10.82  engine
1   avelon  538.26   seats
2    prius   10.91   seats
3  corolla     NaN     NaN