Question

我想将一些json数据放入pandas数据帧中。 json看起来像这样：

{'date': [20170629,
  20170630,
  20170703,
  20170705,
  20170706,
  20170707],
 'errorMessage': None,
 'seriesarr': [{'chartOnlyFlag': 'false',
   'dqMaxValidStr': None,
   'expression': 'DB(FXO,V1,EUR,USD,7D,VOL)',
   'freq': None,
   'frequency': None,
   'iDailyDates': None,
   'label': '',
   'message': None,
   'plotPoints': [0.0481411225888,
    0.0462401214563,
    0.0587196848727,
    0.0765737640932,
    0.0678912611279,
    0.0675766942022],
   }

我正在尝试创建一个pandas DataFrame，其中'date'作为索引，'plotPoints'作为第二列。我不需要任何其他信息。

我已经尝试了

df = pd.io.json.json_normalize(data, record_path = 'date', meta = ['seriesarr', ['plotPoints']])

当我这样做时，我收到以下错误：

KeyError: ("Try running with errors='ignore' as key %s is not always present", KeyError('plotPoints',)

对此有任何帮助表示赞赏。

谢谢！

Answer 1

IIUC，json_normalize可能无法帮助您。相反，它可能更容易提取数据，然后直接将其加载到数据框中。如果需要，请使用datetime转换为pd.to_datetime：

date = data.get('date')
plotPoints = data.get('seriesarr')[0].get('plotPoints')

df = pd.DataFrame({'date' : pd.to_datetime(date, format='%Y%m%d'),
                   'plotPoints' : plotPoints})
df
        date  plotPoints
0 2017-06-29    0.048141
1 2017-06-30    0.046240
2 2017-07-03    0.058720
3 2017-07-05    0.076574
4 2017-07-06    0.067891
5 2017-07-07    0.067577

_{这是假设您的数据完全如问题所示。}

Answer 2

正如@COLDSPEED指出的那样，直接从字典列中获取数据将是合适的，因为＆＃39; plotPoints＆＃39;包含在字典列表中。

列表理解变体如下所示，日期为索引，绘图点为列..

col1 = data['date']
adict = dict((k,v)  for d in data['seriesarr'] for k,v in d.iteritems() )
col2 = adict['plotPoints']
pd.DataFrame(data= col2, index=col1)

>>>              0
20170629  0.048141
20170630  0.046240
20170703  0.058720
20170705  0.076574
20170706  0.067891
20170707  0.067577

有选择地将JSON数据加载到数据帧中

2 个答案: