我有以下数据框(我已经简化了)
Column0 Column1 Type
Asset Code
R0083TX3P3PATX999 0.00 0.00 variable_name_1
R0084TX3P3WTXNM99 55.74 55.74 variable_name_1
R0087KY2P2KY99999 265.35 265.35 variable_name_1
T7001OK2P2OK99999 0.00 0.00 variable_name_2
T7029LA3P3SLA9999 0.00 0.00 variable_name_2
T7032CA5P5SW99999 0.00 0.00 variable_name_2
T7001OK2P2OK99999 0.00 0.00 variable_name_3
T7029LA3P3SLA9999 9.00 9.00 variable_name_3
T7032CA5P5SW99999 14.00 14.00 variable_name_3
实际上,在添加“类型”列后,我已经将3个不同的数据框连接在一起,这样我就可以知道它来自何处
我的最终目标是创建一个嵌套的json,在其顶部以“ ColumnName”作为键,然后在第二个以“ Asset Code”作为下一个键,该方法具有第三个嵌套对象,该对象只是关联的变量和值。
这样做的目的是可以通过以下方式访问json:
data['Column0']['Asset-Code']
和值列表将会出现
输出将是具有以下格式的json对象:
{
"Column1": {
"R0083TX3P3PATX999": {
"variable_name_1": 0
},
"R0084TX3P3WTXNM99": {
"variable_name_1": 55.74
},
"R0087KY2P2KY99999": {
"variable_name_1": 265.35
},
"T7001OK2P2OK99999": {
"variable_name_2": 0,
"variable_name_3": 0
},
"T7029LA3P3SLA9999": {
"variable_name_2": 0,
"variable_name_3": 9.0
},
"T7032CA5P5SW99999": {
"variable_name_2": 0,
"variable_name_3": 14
}
},
"Column2": {
"R0083TX3P3PATX999": {
"variable_name_1": 0
},
"R0084TX3P3WTXNM99": {
"variable_name_1": 55.74
},
"R0087KY2P2KY99999": {
"variable_name_1": 265.35
},
"T7001OK2P2OK99999": {
"variable_name_2": 2,
"variable_name_3": 3
},
"T7029LA3P3SLA9999": {
"variable_name_2": 2,
"variable_name_3": 9.0
},
"T7032CA5P5SW99999": {
"variable_name_2": 0,
"variable_name_3": 14
}
}
}
我不确定如何执行此操作,这是否意味着包含所有功能的新主数据帧需要重新索引(或进行多索引?),我也在查看groupby函数,但不确定如何应用它们,特别是因为那个第三个嵌套对象。最初,这非常容易,因为我只是将to_json
与orient=columns
一起导出,并且效果很好,但仅适用于两个级别的数据。
答案 0 :(得分:2)
您可以通过在列之间循环并转换为index
的方式来实现此目的:
res = {col: df.pivot('Asset Code','Type', col)\
.replace({np.nan:None})\
.to_dict(orient='index') for col in 'Column1','Column2']}
输出
{'Column1': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
'variable_name_2': None,
'variable_name_3': None},
'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
'variable_name_2': None,
'variable_name_3': None},
'R0087KY2P2KY99999': {'variable_name_1': 265.35,
'variable_name_2': None,
'variable_name_3': None},
'T7001OK2P2OK99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 0.0},
'T7029LA3P3SLA9999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 9.0},
'T7032CA5P5SW99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 14.0}},
'Column2': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
'variable_name_2': None,
'variable_name_3': None},
'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
'variable_name_2': None,
'variable_name_3': None},
'R0087KY2P2KY99999': {'variable_name_1': 265.35,
'variable_name_2': None,
'variable_name_3': None},
'T7001OK2P2OK99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 0.0},
'T7029LA3P3SLA9999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 9.0},
'T7032CA5P5SW99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 14.0}}}
在我用来创建df
的代码下面:
df = pd.DataFrame([
['R0083TX3P3PATX999', 0.00, 0.00, 'variable_name_1'],
['R0084TX3P3WTXNM99', 55.74 , 55.74, 'variable_name_1'],
['R0087KY2P2KY99999', 265.35 , 265.35, 'variable_name_1'],
['T7001OK2P2OK99999', 0.00 , 0.00, 'variable_name_2'],
['T7029LA3P3SLA9999', 0.00 , 0.00, 'variable_name_2'],
['T7032CA5P5SW99999', 0.00 , 0.00, 'variable_name_2'],
['T7001OK2P2OK99999', 0.00 , 0.00, 'variable_name_3'],
['T7029LA3P3SLA9999', 9.00 , 9.00, 'variable_name_3'],
['T7032CA5P5SW99999', 14.00 , 14.00, 'variable_name_3'],
], columns = ['Asset Code','Column1','Column2', 'Type'])