PYTHON 2.7 - 解析结构奇怪的JSON文件

时间:2016-03-14 20:41:19

标签: json python-2.7 parsing dataframe

我目前有一个JSON文件已尝试过几种导入python 2.7的方法。

这是我的数据:

[["195b95d248e5478485bfdff82ed7504a", {"attributes": {"checkin_payment_rate": {"N": "10"}, "dateBooked": {"S": "2015-11-03"}, "dateCheckin": {"S": "2015-11-03T15:41:40.126034+0000"}, "date_created": {"S": "2015-11-03T15:41:29.546868+0000"}, "spaceID": {"S": "67dcfcf3fafe4cde9e50069cdbff2314"}, "stripe_transferID": {"S": "tr_1736umJLCycAnsZaf52drYC0"}, "userID": {"S": "b0c096530f464c1fb2cba8ed5470bbc6"}}}], ["413b1dfe841c4f95b2169da369179cd1", {"attributes": {"checkin_payment_rate": {"N": "10"}, "dateBooked": {"S": "2015-09-11"}, "dateCheckin": {"S": "2015-09-11T20:22:40.218580+0000"}, "date_created": {"S": "2015-09-11T18:39:33.374925+0000"}, "spaceID": {"S": "8c85543487ba49dd816f9b1eceafd3ca"}, "stripe_transferID": {"S": "tr_16jy2eJLCycAnsZatj0aVWyB"}, "userID": {"S": "38522c00725245f58f58cca01a8b62c7"}}}], 

其中......正如你所看到的......只是一条大屁股。

以下是运行简单加载命令和pprint时代码的样子。

import json
import pandas as pd
from pandas.io.json import json_normalize 
from pprint import pprint


with open('example.json') as json_data:
    data = json.load(json_data)


pprint(data)

这是结果

[[u'195b95d248e5478485bfdff82ed7504a',
  {u'attributes': {u'checkin_payment_rate': {u'N': u'10'},
               u'dateBooked': {u'S': u'2015-09-03'},
               u'dateCheckin': {u'S': u'2015-11-03T15:41:40.126034+0000'},
               u'date_created': {u'S': u'2015-11-03T15:41:29.546868+0000'},
               u'spaceID': {u'S': u'67dcfcf3fafe4cde9e50069cdbff2314'},
               u'stripe_transferID': {u'S': u'hr_9876umJLCycAnsZaf52drYC0'},
               u'userID': {u'S': u'c9df86530f464c1fb2cba8ed5470bbc6'}}}],
 [u'413b1dfe841c4f95b2169da369179cd1',
  {u'attributes': {u'checkin_payment_rate': {u'N': u'10'},
               u'dateBooked': {u'S': u'2015-04-11'},
               u'dateCheckin': {u'S': u'2015-09-11T20:22:40.218580+0000'},
               u'date_created': {u'S': u'2015-09-11T18:39:33.374925+0000'},
               u'spaceID': {u'S': u'8c85543487ba49dd816f9b1eceafd3ca'},
               u'stripe_transferID': {u'S': u'gr_76jy2eJLCycAnsZatj0aVWyB'},
               u'userID': {u'S': u'36536c00725245f58f58cca01a8b62c7'}}}],

我的目标是将这些数据转换为一个漂亮,有序的数据框,其中包含一个总体ID列(在第一种情况下[[u'195b95d248e5478485bfdff82ed7504a“),同一行中的每个属性都有单独的列。< / p>

我试过

test1 = pd.read_json("example.json","records","frame") 

给了我这个:

                                     0  \                                                      
0     195b95d248e5478485bfdff82ed7504a   
1     413b1dfe841c4f95b2169da369179cd1   
                                                  1  
0     {u'attributes': {u'stripe_transferID': {u'S': ...  
1     {u'attributes': {u'stripe_transferID': {u'S': ...  

结果很好,因为它给了我一列属性列旁边的过度IDS列......但是该ID的所有属性都集中在第二列。

我还尝试了下面的pandas规范化选项:

test2 = pd.io.json.json_normalize(data,'attributes',['stripe_transferID','dateCheckin','userID','spaceID','date_created','dateBooked','checkin_payment_rate','N'])

但我一直收到错误list indices must be integers, not str

有关如何将第二列拆分为test1中的各列或使test2工作的任何想法?谢谢你们!

1 个答案:

答案 0 :(得分:1)

不确定你的具体情况是什么,但是如果每个元素结构总是相同的,你可以使用基本的Python对象来实现你所追求的确切输出,在这种情况下将数据作为字符串加载,我只是复制粘贴你的例如,修复括号使其看起来像一个popper元素,在这种情况下添加&#39;]&#39;最后,使它看起来像一个列表,并使用ast使其成为一个真正的列表:

import ast
import pandas as pd

l =    '''[["195b95d248e5478485bfdff82ed7504a", {"attributes":{"checkin_payment_rate": {"N": "10"}, 
    "dateBooked": {"S": "2015-11-03"}, "dateCheckin": {"S": "2015-11-03T15:41:40.126034+0000"},
    "date_created": {"S": "2015-11-03T15:41:29.546868+0000"}, "spaceID": {"S": "67dcfcf3fafe4cde9e50069cdbff2314"},
    "stripe_transferID": {"S": "tr_1736umJLCycAnsZaf52drYC0"}, "userID": {"S": "b0c096530f464c1fb2cba8ed5470bbc6"}}}],
   ["413b1dfe841c4f95b2169da369179cd1", {"attributes": {"checkin_payment_rate": {"N": "10"}, 
    "dateBooked": {"S": "2015-09-11"}, "dateCheckin": {"S": "2015-09-11T20:22:40.218580+0000"}, 
    "date_created": {"S": "2015-09-11T18:39:33.374925+0000"}, "spaceID": {"S": "8c85543487ba49dd816f9b1eceafd3ca"}, "stripe_transferID": 
    {"S": "tr_16jy2eJLCycAnsZatj0aVWyB"}, "userID": {"S": "38522c00725245f58f58cca01a8b62c7"}}}]]'''

data = ast.literal_eval(l)

从这里开始它就是python,创建一个自定义函数:

def Parse(e):
    dic = {k:v.values()[0] for (k,v)  in e[1]['attributes'].items()}     #lose the S and N indicators
    dic['id'] = e[0] #get the ID 
    return dic 

将它应用于每个元素:

pd.DataFrame([Parse(e) for e in data])

你应该得到这个(希望你正在寻找的):

partial output

请注意,列顺序是字母顺序,因为它来自字典。