我有一个csv(我将其转换为/
数据帧),其中每一行包含一个不同的pandas
文件,每个JSON
文件的格式和对象与其他的,每个代表一个唯一的交易(购买),我想获取此数据框并将其转换为一个数据框或excel文件,其中每一列代表JSON
文件中的一个对象,每一行代表每笔交易。
JSON
还包含数组,在这种情况下,我希望能够检索数组的每个元素。理想情况下,我希望能够从JSON
文件中检索所有可能的对象,并将它们变成列。
一行的简化版本为:
JSON
还有我的python代码
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
我的预期输出将简化为
答案 0 :(得分:0)
您的意思是这样的输出,例如获取area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
第一:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx,
应该用双引号
获取json文档:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
将其格式化为字符串:
jsonString = ''.join(newjson)
变成python对象:
jsonData = json.loads(jsonString)
使用字典操作提取字段并转换为pandas数据框:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])