熊猫数据框并转换为Json

时间:2019-12-06 17:31:15

标签: python json pandas dataframe data-science

基本上,我正在读取一个熊猫数据框并将其转换为Json。我是编码的初学者,但是我知道使用apply函数而不是iterrows是更好的选择(而且我已经尝试使用apply函数,但是在理解语法和查找方面有些困难我的解决方案出现了!)

==============================

我从excel中读取的数据

id     label        id_customer     label_customer    part_number   number_customer   product   label_product    key    country  value_product

6     Sao Paulo      CUST-99992         Brazil          982               10          sho1564       shoes       SH-99   Chile         1.5        

6     Sao Paulo      CUST-99992         Brazil          982               10          sn47282       sneakers    SN-71   Germany       43.8 

6     Sao Paulo      CUST-43535         Argentina       435               15          sk84393       skirt       SK-11   Netherlands   87.1  

92    Hong Hong      CUST-88888         China           785               58          ca40349       cap         CA-82   Russia        3.95

==============================

代码:

import pandas as pd 
import json

df = pd.read_excel(path)

result = []
for labels, df1 in df.groupby(['id', 'label'],sort=False):
    id_, label = labels
    record = {'id': int(id_), 'label': label, 'Customer': []}
    for inner_labels, df2 in df1.groupby(['id_customer', 'label_customer'],sort=False):
        id_,label = inner_labels
        record['Customer'].append({
            'id': id_,
            'label': label,
            'Number': [{'part': str(p), 'number_customer': str(s)} for p, s in zip(df2['part_number'], df2['number_customer'])]  
            })

    result.append(record)

==============================

Json,我得到了:

[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "Number": [
     {
      "part": "982",
      "number_customer": "10"
     },
     {
      "part": "982",
      "number_customer": "10"
     }
    ]
   },
   {
    "id": "CUST-43535",
    "label": "Argentina",
    "Number": [
     {
      "part": "435",
      "number_customer": "15"
     }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Kong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "Number": [
     {
      "part": "785",
      "number_customer": "58"
     }
    ]
   }
  ]
 }
]

==============================

Json期望的

[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "Number": [
     {
      "part": "982",
      "number_customer": "10",
      "Procucts": [
       {
        "product": "sho1564",
        "label_product": "shoes",
        "Order": [
        {
         "key": "SH-99",
         "country": "Chile",    
         "value_product": "1.5"
        }   
       ]            
     },
     {
        "product": "sn47282",
        "label_product": "sneakers",
        "Order": [
        {
         "key": "SN-71",
         "country": "Germany",  
         "value_product": "43.8"
        }   
       ] 
      }
      ]
     }
    ] 
   },
   {
    "id": "CUST-43535",
    "label": "Argentina",
    "Number": [
     {
      "part": "435",
      "number_customer": "15",
      "Procucts": [
       {
        "product": "sk84393",
        "label_product": "skirt",
        "Order": [
        {
         "key": "SK-11",
         "country": "Netherlands",  
         "value_product": "87.1"
        }   
       ]            
      }
      ]
     }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Kong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "Number": [
     {
      "part": "785",
      "number_customer": "58",
      "Procucts": [
       {
        "product": "ca40349",
        "label_product": "cap",
        "Order": [
        {
         "key": "CA-82",
         "country": "Russia",   
         "value_product": "3.95"
        }   
       ]            
      }
      ]
     }
    ]
   }
  ]
 }
]

==============================

即使idlabel是另一个组,id_customerlabel customer是另一个组,也要part_numbernumber_customer是一组信息,另一个productlabel_product,另一个keycountryvalue_product

我期望的Json取决于我在数据框中的信息。

有人可以以任何方式帮助我吗?

2 个答案:

答案 0 :(得分:1)

import pandas as pd 
import json

df = pd.read_excel(path)

result = []
for labels, df1 in df.groupby(['id', 'label'], sort=False):
    id_, label = labels
    record = {'id': int(id_), 'label': label, 'Customer': []}
    for inner_labels, df2 in df1.groupby(['id_customer', 'label_customer'], sort=False):
        id_, label = inner_labels
        customer = {'id': id_, 'label': label, 'Number': []}
        for inner_labels, df3 in df2.groupby(['part_number', 'number_customer'], sort=False):
            p, s = inner_labels
            number = {'part': str(p), 'number_customer': str(s), 'Products': []}
            for inner_labels, df4 in df3.groupby(['product', 'label_product'], sort=False):
                p, lp = inner_labels
                product = {'product': p, 'label_product': lp, 'Order': []}
                for k, c, v in zip(df4['key'], df4['country'], df4['value_product']):
                    product['Order'].append({'key': k, 'country': c, 'value_product': v})
                number['Products'].append(product)
            customer['Number'].append(number)
        record['Customer'].append(customer)
    result.append(record)

答案 1 :(得分:1)

希望这是有用的!

from io import StringIO
import pandas as pd
import json

csv = """id,label,id_customer,label_customer,part_number,number_customer,product,label_product,key,country,value_product
6,Sao Paulo,CUST-99992,Brazil,982,10,sho1564,shoes,SH-99,Chile,1.5
6,Sao Paulo,CUST-99992,Brazil,982,10,sn47282,sneakers,SN-71,Germany,43.8
6,Sao Paulo,CUST-43535,Argentina,435,15,sk84393,skirt,SK-11,Netherlands,87.1
92,Hong Hong,CUST-88888,China,785,58,ca40349,cap,CA-82,Russia,3.95"""
csv = StringIO(csv)

df = pd.read_csv(csv)

def split(df, groupby, json_func):
    for x, group in df.groupby(groupby):
        yield json_func(group, *x)

a = list(split(df, ['id', 'label'], lambda grp, id_, label: {"id": id_, "label": label, "Customer": list(
    split(grp, ['id_customer', 'label_customer'], lambda grp_1, id_cust, label_cust: {"id": id_cust, "label": label_cust, "Number": list(
        split(grp_1, ['part_number', 'number_customer'], lambda grp_2, part, num_cust: {"part": part, "number_customer": num_cust, "Products": list(
            split(grp_2, ['product', 'label_product'], lambda grp_3, product, label_product: {"product": product, "label_product": label_product, "Order": list(
                split(grp_3, ['key', 'country', 'value_product'], lambda _, key, country, value_product: {"key": key, "country": country, "value_product": value_product}))}
            ))})      
)}))}))

display(a)