python json.loads到pandas dataframe

时间:2017-09-25 23:13:13

标签: python python-2.7 pandas

我有一个返回JSON数据的URL,如下所示:

{
    u 'fields': [{
            u 'keyField': False,
            u 'name': u '_blockid',
            u 'fieldType': u 'long'
        }, {
            u 'keyField': False,
            u 'name': u '_collector',
            u 'fieldType': u 'string'
        }, {
            u 'keyField': False,
            u 'name': u '_collectorid',
            u 'fieldType': u 'long'
        }, {
            u 'keyField': False,
            u 'name': u '_messageid',
            u 'fieldType': u 'long'
        }
    ],
    u 'messages': [{
            u 'map': {
                u '_messageid': u '-9223368783568280026',
                u '_collectorid': u '135927517',
                u '_blockid': u '-9223372036519990555',
                u '_collector': u 'collector1',
            }
        }, {
            u 'map': {
                u '_messageid': u '-92233645345280026',
                u '_collectorid': u '13545342517',
                u '_blockid': u '-92234254242343219990555',
                u '_collector': u 'collector2',
            }
        }
    ]
}

这是一个片段。真正的JSON在['消息'] ['地图']

下包含数千个值

我有一个运行如下的脚本

rJSON = requests.get(JsonURL, auth=(username, password))
DATA = json.loads(rJSON.text)
for x in DATA[u'messages']:
    print type(x[u'map'])
    for i in x[u'map']:
        print np.isscalar(x[u'map'][i])

    df = pd.DataFrame.from_dict(x[u'map'])
    break ### TESTING ###

这将输出以下内容

<type 'dict'>
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-1b71c28d4d83> in <module>()
     11     for i in x[u'map']:
     12         print np.isscalar(q[i])
---> 13     df = pd.DataFrame.from_dict(x[u'map'])
     14 
     15     #if isinstance(msgData, pd.DataFrame): # If the variable is a dataframe, append to it...

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in from_dict(cls, data, orient, dtype)
    849             raise ValueError('only recognize index or columns for orient')
    850 
--> 851         return cls(data, index=index, columns=columns, dtype=dtype)
    852 
    853     def to_dict(self, orient='dict'):

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5494     # figure out the index, if necessary
   5495     if index is None:
-> 5496         index = extract_index(arrays)
   5497     else:
   5498         index = _ensure_index(index)

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in extract_index(data)
   5533 
   5534         if not indexes and not raw_lengths:
-> 5535             raise ValueError('If using all scalar values, you must pass'
   5536                              ' an index')
   5537 

ValueError: If using all scalar values, you must pass an index

我理解它很疯狂,因为字典包含标量值,但我无法弄清楚为什么它们被json.loads()作为标量加载到字典中,或者如何转换它们从标量到字符串。

我的最终目标是将所有[&#39;] [&#39;]地图&#39;数据和pd.concat在循环中转换为我可以分析的1个大型数据框。

是否可以阻止json.loads将它们作为标量加载?或者有没有办法将它们从标量转换为可以加载到数据框中的其他东西?

1 个答案:

答案 0 :(得分:2)

数据中的消息是一个字典列表,您可以使用DataFrame.from_records加载它,然后使用apply(pd.Series)将内部字典转换为最终数据的行帧:

pd.DataFrame.from_records(data['messages']).map.apply(pd.Series)

#                   _blockid  _collector _collectorid            _messageid
#0      -9223372036519990555  collector1    135927517  -9223368783568280026
#1  -92234254242343219990555  collector2  13545342517    -92233645345280026