处理熊猫数据框的数据

时间:2019-12-10 13:53:55

标签: python json pandas dataframe

我正在读取一个数据框,并尝试将一个列表“插入”另一个列表中,然后将其转换为json文件。我正在使用python 3和0.25.3版本的pandas。

我的数据框:

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

我的代码:

import pandas as pd 
import json

data = pd.read_excel(path)

data["part_number"] = data["part_number"].apply(lambda x: str(x))
data["number_client"]  = data["number_client"].apply(lambda x: str(x))

data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)

data["part_number"] = data["part_number"].apply(lambda x: {"part": x})
data["number_client"] = data["number_client"].apply(lambda x: {"client": x})

data["id_customer"] = data["id_customer"].apply(lambda x: {"id": x})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})
data["number"] = data.apply(lambda x: [{**x["part_number"], **x["number_client"]}], axis=1)

data["Customer"] = data.apply(lambda x: [{**x["id_customer"], **x["label_customer"], **data["number"]}],axis=1)

data = data[["id", "label", "Customer"]]

data.to_json(path)

我得到的Json输出:

[{
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUST-99992",
        "label": "Brazil",
        "0": [{
            "part": "7897",
            "client": "982"
        }],
        "1": [{
            "part": "888",
            "client": "12"
        }],
        "2": [{
            "part": "147",
            "client": "288"
        }]
    }]
}, {
    "id": 6,
    "label": "Sao Paulo",
    "Customer": [{
        "id": "CUST-99992",
        "label": "Brazil",
        "0": [{
            "part": "7897",
            "client": "982"
        }],
        "1": [{
            "part": "888",
            "client": "12"
        }],
        "2": [{
            "part": "147",
            "client": "288"
        }]
    }]
}, {
    "id": 92,
    "label": "Hong Kong",
    "Customer": [{
        "id": "CUST-888888",
        "label": "China",
        "0": [{
            "part": "7897",
            "client": "982"
        }],
        "1": [{
            "part": "888",
            "client": "12"
        }],
        "2": [{
            "part": "147",
            "client": "288"
        }]
    }]
}]

我需要什么

 [{
      "id": 6,
      "label": "Sao Paulo",
        "Customer": [{
            "id": "CUST-99992",
            "label": "Brazil",
            "number": [{
                "part": "7897",
                "client": "982" 
            },
            {
                "part": "888",
                "client": "12"
            }]
        }]
    },
    {     
      "id": 92,
      "label": "Hong Kong",
        "Customer": [{
            "id": "CUST-888888",
            "label": "China",
            "number": [{
                "part": "147",
                "client": "288"
                }]
            }]
        }
    ]

即使idlabel是另一组,id_customerlabel_customer也是part_numbernumber_client a是一组信息,另一个。 Customernumber是列表,它们里面可以有很多对象(对象的数量取决于我数据框中的数据)。

我在做什么错,我该如何解决?

非常感谢!

1 个答案:

答案 0 :(得分:1)

首先将两列都转换为字符串,然后使用带有DataFrame.to_dict的lambda函数并重命名列名称,最后通过DataFrame.to_json将输出转换为json:

data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)

f = lambda x: x.split('_')[0]

j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='customer')
        .to_json(orient='records'))

print (j)

    [{
        "id": 6,
        "label": "Sao Paulo",
        "customer": [{
            "id": "CUST-99992",
            "label": "Brazil",
            "number": [{
                "part": "7897",
                "number": "982"
            }, {
                "part": "888",
                "number": "12"
            }]
        }]
    }, {
        "id": 92,
        "label": "Hong Kong",
        "customer": [{
            "id": "CUST-88888",
            "label": "China",
            "number": [{
                "part": "147",
                "number": "288"
            }]
        }]
    }]