我有这个JSON文件:
{"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"}}]}
现在我得到带有名称,编号,默认价格,价格的表格,但是prices列就像三行,需要从键p "p": "63", "currency": "RMB".
中读取价格63
但是我希望在单独的列中得到一个包含价格和货币的表格,我使用了这个:
ndf = pd.concat([x的价格为pd.Series(x),轴= 1)
但是得到一个错误的答案:
0 1
DZ {"p": "232", "currency": "CAD"} {"p": "62", "currency": "RMB"}
AU {"p": "233", "currency": "CAD"} {"p": "63","currency":"RMB"}
无论如何要纠正此问题,以便获得预期的输出?
Name Number Code currency
name1 number1 AU AUD
name1 number1 DZ RMB
非常感谢!
答案 0 :(得分:0)
您可以在apply(pd.Series)
列上使用defaultPrice
将其拆分为单独的列,然后将其重新连接到原始数据框。
prices = {"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"}}]}
ndf = pd.DataFrame(prices['a'])
pd.concat([ndf, ndf['defaultPrice'].apply(pd.Series)], axis=1).drop('defaultPrice', axis=1)
但是,您的prices
列仍然是词典列表。但是,由于您没有提到要如何处理,因此我将其保留为原样(不包括在输出中)。
输出:
Name number p currency
name1 number1 232 CAD
name2 number2 233 CAD
答案 1 :(得分:0)
json字符串:
j = {"a": [{ "Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"
}
},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"
}
}
]}
获得所需输出的代码:
country_codes = set()
for d in j['a']:
c = d['prices'].keys()
country_codes.update(c)
country_codes = sorted([i for i in country_codes if not i in ['lg','Lg']])
country_codes
meta = ['Name','number'] + [['prices',c,'p'] for c in country_codes] + [['prices',c,'currency'] for c in country_codes]
df = json_normalize(j['a'], record_path = 'prices', meta = meta,errors='ignore')
df = df.rename(columns={0: 'countryCode'})
df = df[~df['countryCode'].isin(['lg','Lg'])]
for idx, row in df.iterrows():
country = row['countryCode']
col_price = df.columns[df.columns.str.contains(country+'.p')][0]
col_currency = df.columns[df.columns.str.contains(country+'.currency')][0]
price = row[col_price]
currency = row[col_currency]
df.loc[idx,'price'] = price
df.loc[idx,'currency'] = currency
df = df[['Name','number','countryCode', 'currency', 'price']]
df
这给出了:
Name number countryCode currency price
0 name1 number1 DZ RMB 62
1 name1 number1 AU AUD 73
3 name2 number2 DZ RMB 63
4 name2 number2 US USD 72