如何将此json多列表写入python pandas中的单独列中

时间:2019-03-18 20:03:50

标签: python json pandas

我有这个JSON文件:

{"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
 "AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
 "defaultPrice": {"p": "233", "currency": "CAD"},
 "prices": {"DZ": {"p": "63", "currency": "RMB"},
 "US": {"p": "72", "currency": "USD"},
 "Lg": "en"}}]}

现在我得到带有名称,编号,默认价格,价格的表格,但是prices列就像三行,需要从键p "p": "63", "currency": "RMB".中读取价格63

但是我希望在单独的列中得到一个包含价格和货币的表格,我使用了这个:

ndf = pd.concat([x的价格为pd.Series(x),轴= 1)

但是得到一个错误的答案:

 0                                                  1
 DZ           {"p": "232", "currency": "CAD"}  {"p": "62", "currency": "RMB"}
 AU           {"p": "233", "currency": "CAD"}    {"p": "63","currency":"RMB"}

无论如何要纠正此问题,以便获得预期的输出?

Name    Number   Code  currency
name1   number1   AU    AUD      
name1   number1   DZ    RMB      

非常感谢!

2 个答案:

答案 0 :(得分:0)

您可以在apply(pd.Series)列上使用defaultPrice将其拆分为单独的列,然后将其重新连接到原始数据框。

prices = {"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
 "AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
 "defaultPrice": {"p": "233", "currency": "CAD"},
 "prices": {"DZ": {"p": "63", "currency": "RMB"},
 "US": {"p": "72", "currency": "USD"},
 "Lg": "en"}}]}

ndf = pd.DataFrame(prices['a'])
pd.concat([ndf, ndf['defaultPrice'].apply(pd.Series)], axis=1).drop('defaultPrice', axis=1)

但是,您的prices列仍然是词典列表。但是,由于您没有提到要如何处理,因此我将其保留为原样(不包括在输出中)。

输出:

Name    number  p   currency
name1   number1 232 CAD
name2   number2 233 CAD

答案 1 :(得分:0)

json字符串:

j = {"a": [{ "Name": "name1",
             "number": "number1",
             "defaultPrice":  {"p": "232", "currency": "CAD"},
             "prices": {"DZ": {"p": "62", "currency": "RMB"},
                        "AU": {"p": "73", "currency": "AUD"},
                        "lg": "en"
                       }
             },
            {"Name": "name2",
             "number": "number2",
             "defaultPrice":  {"p": "233", "currency": "CAD"},
             "prices": {"DZ": {"p": "63", "currency": "RMB"},
                        "US": {"p": "72", "currency": "USD"},
                        "Lg": "en"
                       }
            }
          ]}

获得所需输出的代码:

country_codes = set()
for d in j['a']:
  c = d['prices'].keys()
  country_codes.update(c)

country_codes = sorted([i for i in country_codes if not i in ['lg','Lg']])
country_codes

meta = ['Name','number'] + [['prices',c,'p'] for c in country_codes] + [['prices',c,'currency'] for c in country_codes] 

df = json_normalize(j['a'], record_path = 'prices', meta = meta,errors='ignore')
df = df.rename(columns={0: 'countryCode'})
df = df[~df['countryCode'].isin(['lg','Lg'])]

for idx, row in df.iterrows():
    country = row['countryCode']
    col_price = df.columns[df.columns.str.contains(country+'.p')][0]
    col_currency = df.columns[df.columns.str.contains(country+'.currency')][0]
    price = row[col_price]
    currency = row[col_currency]
    df.loc[idx,'price'] = price
    df.loc[idx,'currency'] = currency

df = df[['Name','number','countryCode', 'currency', 'price']]


df

这给出了:

    Name   number countryCode currency price
0  name1  number1          DZ      RMB    62
1  name1  number1          AU      AUD    73
3  name2  number2          DZ      RMB    63
4  name2  number2          US      USD    72