我正在读取数据框,并尝试在另一个列表中插入一个列表,然后将其转换为json文件。我正在使用python 3和0.25.3版本的pandas。
============================
我正在读取的数据:
id label id_customer label_customer part_number number_client
6 Sao Paulo CUST-99992 Brazil 7897 982
6 Sao Paulo CUST-99992 Brazil 888 12
92 Hong Kong CUST-88888 China 147 288
============================
这是我的代码:
import pandas as pd
import json
data = pd.read_excel(path)
data["part_number"] = data["part_number"].apply(lambda x: str(x))
data["number_client"] = data["number_client"].apply(lambda x: str(x))
data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)
data["part_number"] = data["part_number"].apply(lambda x: {"part": x})
data["number_client"] = data["number_client"].apply(lambda x: {"client": x})
data["id_customer"] = data["id_customer"].apply(lambda x: {"id": x})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})
data["Customer"] = data.apply(lambda x: [{**x["id_customer"], **x["label_customer"]}],axis=1)
data["number"] = data.apply(lambda x: [{**x["part_number"], **x["number_client"]}], axis=1)
data = data[["id", "label", "Customer","number"]]
data.to_json(path)
============================
预期结果:
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUS-99992",
"label": "Brazil",
"number": [{
"part": "7897",
"client": "892"
},
{
"part": "888",
"client": "12"
}]
}]
},
{
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUS-88888",
"label": "China",
"number": [{
"part": "147",
"client": "288"
}]
}]
}]
============================
我得到的是
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUS-99992",
"label": "Brazil"
}],
"number": [{
"part": "7897",
"client": "892"
}],
"number": [{
"part": "888",
"client": "12"
}]
}, {
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUS-88888",
"label": "China"
}],
"number": [{
"part": "147",
"client": "288"
}]
}]
=====================
我试图使用iterrows
函数做同样的事情(并在这里发布了一个问题“使用熊猫编写数据框并转换为JSON”),但是有人建议我尝试使用另一种函数来尝试另一种方式。我知道在number
内添加data
对象是愚蠢的事情,但是我已经尝试过其他方法。
你能帮我吗?
答案 0 :(得分:3)
定义以下重新格式化功能:
def reformat(row):
d1 = { 'part': str(row.part_number), 'client': str(row.number_client)}
d2 = { 'id': row.id_customer, 'label': row.label_customer, 'number': [d1] }
return { 'id': row.id, 'label': row.label, 'Customer': [d2] }
然后按以下方式应用它:
df.apply(reformat, axis=1).to_json('result.json', orient='records')
结果(为便于阅读而重新格式化)是:
[ { "id":6,
"label":"Sao Paulo",
"Customer":[
{ "id":"CUST-99992",
"label":"Brazil",
"number":[{"part":"7897","client":"982"}]
}
]
},
{ "id":92,
"label":"Hong Kong",
"Customer":[
{ "id":"CUST-88888",
"label":"China",
"number":[{"part":"147","client":"288"}]
}
]
}
]
要应对单个 标签 /的多个行的变体/ label_customer ,采用另一种方法:
从定义以下功能开始:
获取 number 属性的内容:
def getNum(grp):
return eval(grp[['part', 'client']].to_json(orient='records'))
请注意此功能中的 eval 。否则结果将是一个字符串 (而不是词典列表)。
获取 Customer 属性的内容:
def getCust(grp):
r0 = grp.iloc[0]
return { 'id': r0.id_customer, 'label': r0.label_customer, 'number': getNum(grp) }
获取当前组的整个JSON元素的内容:
def getGrp(grp):
r0 = grp.iloc[0]
return { 'id': r0.id, 'label': r0.label, 'Customer': getCust(grp) }
然后将列类型转换为 string :
df.part_number = df.part_number.astype('str')
df.number_client = df.number_client.astype('str')
要获得最终结果,请运行:
df.rename(columns={'part_number': 'part', 'number_client': 'client'})\
.groupby(['id', 'label', 'id_customer', 'label_customer'])\
.apply(getGrp).to_json(orient='values')
上面的代码: