如何从JSON文件的csv / pandas数据框中提取对象?

时间:2019-05-28 17:15:08

标签: python json

我有一个csv(我将其转换为/数据帧),其中每一行包含一个不同的pandas文件,每个JSON文件的格式和对象与其他的,每个代表一个唯一的交易(购买),我想获取此数据框并将其转换为一个数据框或excel文件,其中每一列代表JSON文件中的一个对象,每一行代表每笔交易。

JSON还包含数组,在这种情况下,我希望能够检索数组的每个元素。理想情况下,我希望能够从JSON文件中检索所有可能的对象,并将它们变成列。

一行的简化版本为:

JSON

还有我的python代码

    {  
       "source":{  
          "analyze":true,
          "billing":{  
             "gender":null,
             "name":"xxxxx",
             "phones":[  
                {  
                   "area_code":"xxxxx",
                   "country_code":"xxxxx",
                   "number":"xxxxx",
                   "phone_type":"xxxxx"
                }
             ]
          },
          "created_at":"xxxxx",
          "customer":{  
             "address":{  
                "city":"xxxxx",
                "complement":"xxxxx",
                "country":"xxxxx",
                "neighborhood":"xxxxx",
                "number":"xxxxx",
                "state":"xxxxx",
                "street":"xxxxx",
                "zip_code":"xxxxx"
             },
             "date_of_birth":"xxxxx",
             "documents":[  
                {  
                   "document_type":"xxxxx",
                   "number":"xxxxx"
                }
             ],
             "email":"xxxxx",
             "gender":xxxxx,
             "name":"xxxxx",
             "number_of_previous_orders":xxxxx,
             "phones":[  
                {  
                   "area_code":"xxxxx",
                   "country_code":"xxxxx",
                   "number":"xxxxx",
                   "phone_type":"xxxxx"
                }
             ],
             "register_date":xxxxx,
             "register_id":"xxxxx"
          },
          "device":{  
             "ip":"xxxxx",
             "lat":"xxxxx",
             "lng":"xxxxx",
             "platform":xxxxx,
             "session_id":xxxxx
          }
    }
    }

我的预期输出将简化为

Expected Output

1 个答案:

答案 0 :(得分:0)

您的意思是这样的输出,例如获取area_code:

        A_col                                          area_code
0   {"source":{"analyze":true,"billing":{"gender":...   xxxxx

第一:

"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx,应该用双引号

获取json文档:

newjson = []
with open('./example.json', 'r') as f:
    for line in f:
        line = line.strip()
        newjson.append(line)

将其格式化为字符串:

jsonString = ''.join(newjson)

变成python对象:

jsonData = json.loads(jsonString)

使用字典操作提取字段并转换为pandas数据框:

newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])