我正在尝试将下面提到的数据框转换为所需的json
column_id,column_name,mandatory,column_data_type,column_data_length,_id,data_format,file_type,active_ind
1,PAT_ID,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
2,PAT_NAME,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
3,PAT_AGE,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
就像下面提到的json
{
"_id": 5f2193c39448c44f0c1b65e0,
"data_format": "TEXT",
"file_type": "FACT",
"columns": [
{
"column_id": 1,
"column_name": "PAT_ID",
"mandatory": "false",
"column_data_type": "VARCHAR",
"column_data_length": 2500
},
{
"column_id": 2,
"column_name": "PAT_NAME",
"mandatory": "false",
"column_data_type": "VARCHAR",
"column_data_length": 2500
}
],
"active_ind": "true",
}
我尝试使用多种方法基于列名和列ID进行分组
这将对列进行分组,但不是所有值
cac= df.groupby('column_id').apply(lambda x: x.to_json(orient='records'))
cac = df.to_json(orient='records')
我无法分隔ID和列。
请帮助我
答案 0 :(得分:0)
这是我要做的:
# Load data
df = pd.read_csv('data.csv')
# Create list of dict for columns column
col_set = ['column_id',
'column_name',
'mandatory',
'column_data_type',
'column_data_length']
df['columns'] = df[col_set].apply(lambda x: x.to_dict(), axis=1)
reorder = ['column_id',
'column_name',
'mandatory',
'column_data_type',
'column_data_length',
'columns',
'active_ind',
'_id',
'data_format',
'file_type']
df = df[reorder]
# Group by similar rows and join sub dicts
col_set_2 = ['_id', 'data_format', 'file_type', 'columns', 'active_ind']
col_set_3 = ['_id', 'data_format', 'file_type', 'active_ind']
df2 = df[col_set_2].groupby(col_set_3)['columns'].apply(lambda x: list(x)).reset_index()
df2 = df2[col_set_2]
# Dataframe to json
parsed = json.loads(df2.to_json(orient='records', indent=4))
result = json.dumps(parsed[0], indent=4)
print(result)
{
"_id": "5f2193c39448c44f0c1b65e0",
"data_format": "TEXT",
"file_type": "FACT",
"columns": [
{
"column_id": 1,
"column_name": "PAT_ID",
"mandatory": false,
"column_data_type": "VARCHAR",
"column_data_length": 2500
},
{
"column_id": 2,
"column_name": "PAT_NAME",
"mandatory": false,
"column_data_type": "VARCHAR",
"column_data_length": 2500
},
{
"column_id": 3,
"column_name": "PAT_AGE",
"mandatory": false,
"column_data_type": "VARCHAR",
"column_data_length": 2500
}
],
"active_ind": true
}