熊猫数据框和附加对象以转换为JSON

时间:2019-12-05 17:54:16

标签: python json pandas dataframe if-statement

基本上,我正在使用熊猫读取xlsx文件并将其转换为json文件。我知道如何做,但我想我必须创建一个'if'语句来读取每一行,并找出与上一行不同的元素,然后将其附加到我的对象中。

我正在读取的数据

id     label        id_customer     label_customer    part_number 

6     Sao Paulo      CUST-99992         Brazil          7897

6     Sao Paulo      CUST-99992         Brazil          982

6     Sao Paulo      CUST-43535         Brazil          435

92    Hong Hong      CUST-88888         China           785

===============================

这是我的代码:

import pandas as pd
import json

file_imported = pd.read_excel('testing.xlsx', sheet_name = 'Plan1')

list_final  = []
for index, row in file_imported.iterrows():
    list1 = []
    list_final.append ({
        "id"       : int(row['id']),
        "label"    : str(row['label']),
        "Customer" : list1
        })

    list2 = []
    list1.append ({
       "id"       : str(row['id_customer']) ,
       "label"    : str(row['label_customer']),
       "number"   :  list2
       })

    list2.append({
        "part"    : str(row['part_number'])  
       })      

print (list_final)

with open ('testing.json', 'w') as f:
    json.dump(list_final, f, indent= True)

===============================

Json输出:

    [
     {
      "id": 6,
      "label": "Sao Paulo",
      "Customer": [
       {
        "id": "CUST-99992",
        "label": "Brazil",
        "number" : [
        {
        "part": "7897"
        }
        ]
       }
      ]
     },
     {
      "id": 6,
      "label": "Sao Paulo",
      "Customer": [
       {
        "id": "CUST-99992",
        "label": "Brazil",
        "number" : [
        {
        "part": "982"
        }
        ]
       }
      ]
     },
     {
      "id": 6,
      "label": "Sao Paulo",
      "Customer": [
       {
        "id": "CUST-43535",
        "label": "Brazil",
        "number" : [
        {
        "part": "435"
        }
        ]
       }
      ]
     },
     {
      "id": 92,
      "label": "Hong Hong",
      "Customer": [
       {
        "id": "CUST-88888",
        "label": "China",
        "number" : [
        {
        "part": "785"
        }
        ]
       }
      ]
     }
    ]  

===============================

,我需要这样的东西:

[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "number" : [
    {
    "part": "7897"
    },
    {
    "part": "982"
    }
    ]
   },
   {
    "id": "CUST-43535",
    "label": "Brazil",
    "number" : [
    {
    "part": "435"
    }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Hong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "number" : [
    {
    "part": "785"
    }
    ]
   }
  ]
 }
]

====================

有人可以帮我吗?????

1 个答案:

答案 0 :(得分:1)

查看所需的json,将其分为两组。第一个包含idlabel字段,第二个包含id_customerlabel_customer字段。最里面的数据是part_number,可以使用列表理解和字典理解[{'part': str(p)} for p in df2['part_number']]来创建。其余只是数据处理。

import json

result = []
for labels, df1 in df.groupby(['id', 'label']):
    id_, label = labels
    record = {'id': int(id_), 'label': label, 'Customer': []}
    for inner_labels, df2 in df1.groupby(['id_customer', 'label_customer']):
        id_, label = inner_labels
        record['Customer'].append({
            'id': id_, 
            'label': label, 
            'number': [{'part': str(p)} for p in df2['part_number']]
        })
    result.append(record)
>>> print(json.dumps(result, indent=True))
[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-43535",
    "label": "Brazil",
    "number": [
     {
      "part": "435"
     }
    ]
   },
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "number": [
     {
      "part": "7897"
     },
     {
      "part": "982"
     }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Kong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "number": [
     {
      "part": "785"
     }
    ]
   }
  ]
 }
]