将熊猫数据框转换为具有特定格式的json

时间:2020-08-19 18:52:22

标签: json python-3.x pandas

我正在尝试将下面提到的数据框转换为所需的json

enter image description here

column_id,column_name,mandatory,column_data_type,column_data_length,_id,data_format,file_type,active_ind
1,PAT_ID,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
2,PAT_NAME,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
3,PAT_AGE,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE

就像下面提到的json

{
    "_id": 5f2193c39448c44f0c1b65e0,
    "data_format": "TEXT",
    "file_type": "FACT",
    "columns": [
        {
            "column_id": 1,
            "column_name": "PAT_ID",
            "mandatory": "false",
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
            
        },

        {
            "column_id": 2,
            "column_name": "PAT_NAME",
            "mandatory": "false",
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
          
        }
    ],
    "active_ind": "true",
}

我尝试使用多种方法基于列名和列ID进行分组

  1. 这将对列进行分组,但不是所有值 cac= df.groupby('column_id').apply(lambda x: x.to_json(orient='records'))

  2. cac = df.to_json(orient='records')

我无法分隔ID和列。

请帮助我

1 个答案:

答案 0 :(得分:0)

这是我要做的:

# Load data
df = pd.read_csv('data.csv')

# Create list of dict for columns column
col_set = ['column_id', 
           'column_name', 
           'mandatory', 
           'column_data_type', 
           'column_data_length']
df['columns'] = df[col_set].apply(lambda x: x.to_dict(), axis=1)
reorder = ['column_id', 
           'column_name', 
           'mandatory', 
           'column_data_type', 
           'column_data_length', 
           'columns', 
           'active_ind', 
           '_id', 
           'data_format', 
           'file_type']
df = df[reorder]

# Group by similar rows and join sub dicts
col_set_2 = ['_id', 'data_format', 'file_type', 'columns', 'active_ind']
col_set_3 = ['_id', 'data_format', 'file_type', 'active_ind']
df2 = df[col_set_2].groupby(col_set_3)['columns'].apply(lambda x: list(x)).reset_index()
df2 = df2[col_set_2]

# Dataframe to json
parsed = json.loads(df2.to_json(orient='records', indent=4))
result = json.dumps(parsed[0], indent=4)

print(result)

{
    "_id": "5f2193c39448c44f0c1b65e0",
    "data_format": "TEXT",
    "file_type": "FACT",
    "columns": [
        {
            "column_id": 1,
            "column_name": "PAT_ID",
            "mandatory": false,
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
        },
        {
            "column_id": 2,
            "column_name": "PAT_NAME",
            "mandatory": false,
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
        },
        {
            "column_id": 3,
            "column_name": "PAT_AGE",
            "mandatory": false,
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
        }
    ],
    "active_ind": true
}