json到DataFrame的转换问题

时间:2017-12-04 08:02:52

标签: python json pandas dictionary dataframe

{'endDate': '2017-12-31',
 'results': [{'data': [{'period': '2017-01-01', 'ratio': 26.91301},
    {'period': '2017-02-01', 'ratio': 19.77063},
    {'period': '2017-03-01', 'ratio': 20.40775},
    {'period': '2017-04-01', 'ratio': 16.02843},
    {'period': '2017-05-01', 'ratio': 10.38159},
    {'period': '2017-06-01', 'ratio': 8.2087},
    {'period': '2017-07-01', 'ratio': 8.67815},
    {'period': '2017-08-01', 'ratio': 19.58956},
    {'period': '2017-09-01', 'ratio': 36.94587},
    {'period': '2017-10-01', 'ratio': 36.28194},
    {'period': '2017-11-01', 'ratio': 16.64543},
    {'period': '2017-12-01', 'ratio': 1.67661}],
   'keywords': ['Data_spec'],
   'title': 'Data_spec'},
  {'data': [{'period': '2017-01-01', 'ratio': 17.65139},
    {'period': '2017-02-01', 'ratio': 14.52618},
    {'period': '2017-03-01', 'ratio': 15.00234},
    {'period': '2017-04-01', 'ratio': 12.1521},
    {'period': '2017-05-01', 'ratio': 10.63644},
    {'period': '2017-06-01', 'ratio': 8.59767},
    {'period': '2017-07-01', 'ratio': 8.95312},
    {'period': '2017-08-01', 'ratio': 13.05747},
    {'period': '2017-09-01', 'ratio': 48.00482},
    {'period': '2017-10-01', 'ratio': 23.7811},
    {'period': '2017-11-01', 'ratio': 16.90027},
    {'period': '2017-12-01', 'ratio': 0.89866}],
   'keywords': ['Data_rate'],
   'title': 'Date_rate'},
  {'data': [], 'keywords': ['Data_over'], 'title': 'Data_over'},
  {'data': [{'period': '2017-01-01', 'ratio': 79.17644},
    {'period': '2017-02-01', 'ratio': 84.01851},
    {'period': '2017-03-01', 'ratio': 100.0},
    {'period': '2017-04-01', 'ratio': 91.19442},
    {'period': '2017-05-01', 'ratio': 93.21976},
    {'period': '2017-06-01', 'ratio': 93.42096},
    {'period': '2017-07-01', 'ratio': 89.14895},
    {'period': '2017-08-01', 'ratio': 91.85165},
    {'period': '2017-09-01', 'ratio': 91.24136},
    {'period': '2017-10-01', 'ratio': 90.35611},
    {'period': '2017-11-01', 'ratio': 81.88585},
    {'period': '2017-12-01', 'ratio': 7.49111}],
   'keywords': ['Data_under'],
   'title': 'Data_under'},
  {'data': [{'period': '2017-01-01', 'ratio': 0.70417},
    {'period': '2017-02-01', 'ratio': 1.11997},
    {'period': '2017-03-01', 'ratio': 1.81074},
    {'period': '2017-04-01', 'ratio': 1.38823},
    {'period': '2017-05-01', 'ratio': 0.97914},
    {'period': '2017-06-01', 'ratio': 1.14009},
    {'period': '2017-07-01', 'ratio': 0.78465},
    {'period': '2017-08-01', 'ratio': 1.07973},
    {'period': '2017-09-01', 'ratio': 0.94561},
    {'period': '2017-10-01', 'ratio': 0.85172},
    {'period': '2017-11-01', 'ratio': 1.27422},
    {'period': '2017-12-01', 'ratio': 0.08718}],
   'keywords': ['Data_tune'],
   'title': 'Data_tune'}],
 'startDate': '2017-01-01',
 'timeUnit': 'month'}

上面是'my_dict = json.loads(js)',其中js是字符串。我试图将这些数据放到Pandas DataFrame中。我使用下面的代码。

lst = [pd.DataFrame.from_dict(r['data']).set_index('period').rename(columns={'ratio' : r['title']})
           for r in d['results']]
df = pd.concat(lst, 1)

我的代码完美运行,直到这个js set为空值。 一个问题是你注意到'keyworks':['Data_over']有空'数据'。所以我不能将索引设置为'period'。我仍然想在我的熊猫DF中使用'Data_over'但是空的。是否可以将DataFrame设置为“Data_over”作为列名但值为空?所以我的最终代码可以使用或不使用'数据'将json转换为df。

以下是我想要的输出。 (实际值不同,但你有了概念)

    Data_spec    Data_rate   Data_over   Data_under   Data_tune
2017-01-01  0.55116     NaN     NaN         7.12056     2.25329
2017-02-01  0.32016     0.08915     NaN         6.43161     1.19959
2017-03-01  0.32421     0.10131     NaN         6.48024     1.30091
2017-04-01  0.33232     0.01215     NaN         6.05471     1.26038
2017-05-01  0.39311     0.12968     NaN         6.19655     1.21985
2017-06-01  0.47011     0.03647     NaN         5.71023     1.03748
2017-07-01  4.32016     NaN     NaN         11.85005    0.84295
2017-08-01  8.81053     0.04052     NaN         51.44072    0.89564
2017-09-01  14.46808    0.02836     NaN         100.00000   0.85511
2017-10-01  4.27152     0.10942     NaN         34.65451    0.87132
2017-11-01  0.29989     0.05673     NaN         13.02127    0.77811
2017-12-01  0.00810     0.06079     NaN         0.80243     NaN

1 个答案:

答案 0 :(得分:1)

可以length检查len(r['data']) > 0并过滤list comprehension中的空数据框:

lst = [pd.DataFrame(r['data']).set_index('period').rename(columns={'ratio' : r['title']})
           for r in d['results'] if len(r['data']) > 0]
df = pd.concat(lst, 1)
print (df)

            Data_spec  Date_rate  Data_under  Data_tune
period                                                 
2017-01-01   26.91301   17.65139    79.17644    0.70417
2017-02-01   19.77063   14.52618    84.01851    1.11997
2017-03-01   20.40775   15.00234   100.00000    1.81074
2017-04-01   16.02843   12.15210    91.19442    1.38823
2017-05-01   10.38159   10.63644    93.21976    0.97914
2017-06-01    8.20870    8.59767    93.42096    1.14009
2017-07-01    8.67815    8.95312    89.14895    0.78465
2017-08-01   19.58956   13.05747    91.85165    1.07973
2017-09-01   36.94587   48.00482    91.24136    0.94561
2017-10-01   36.28194   23.78110    90.35611    0.85172
2017-11-01   16.64543   16.90027    81.88585    1.27422
2017-12-01    1.67661    0.89866     7.49111    0.08718

编辑:

如果r['data']为空,可能会创建自定义DataFrame,因为使用了对齐索引d['startDate']

lst = [pd.DataFrame(r['data']).set_index('period').rename(columns={'ratio' : r['title']}) 
        if len(r['data']) > 0 
        else pd.DataFrame([np.nan], columns=[r['title']], index=[d['startDate']])
        for r in d['results'] ]
df = pd.concat(lst, 1)
print (df)
            Data_spec  Date_rate  Data_over  Data_under  Data_tune
2017-01-01   26.91301   17.65139        NaN    79.17644    0.70417
2017-02-01   19.77063   14.52618        NaN    84.01851    1.11997
2017-03-01   20.40775   15.00234        NaN   100.00000    1.81074
2017-04-01   16.02843   12.15210        NaN    91.19442    1.38823
2017-05-01   10.38159   10.63644        NaN    93.21976    0.97914
2017-06-01    8.20870    8.59767        NaN    93.42096    1.14009
2017-07-01    8.67815    8.95312        NaN    89.14895    0.78465
2017-08-01   19.58956   13.05747        NaN    91.85165    1.07973
2017-09-01   36.94587   48.00482        NaN    91.24136    0.94561
2017-10-01   36.28194   23.78110        NaN    90.35611    0.85172
2017-11-01   16.64543   16.90027        NaN    81.88585    1.27422
2017-12-01    1.67661    0.89866        NaN     7.49111    0.08718