将长表转换成较小的

时间:2019-09-04 06:27:19

标签: python pandas

我确实有一个非常长的表格,我想根据给定的列将其转换为较短的表格。该表如下所示:

finaldata = {'date':['01/01/2018','04/05/2018' ,'02/02/2019','01/01/2018','02/02/2019','01/04/2019','01/04/2019', '01/01/2018','20/03/2019','01/04/2019'],'change_type':['modified', 'modified', 'modified', 'added','added','added','added','retired','retired','retired'], 
                    'Age_old':[20,340, 21, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'],'Age_new':[23,346, 217, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'] ,
             'diff_age':[3,6, 96, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'],
             'miles_old':['NAN', 99, 80, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],'miles_new':['NAN', 100, 89, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],
             'diff_miles':['NAN', 1, 9, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],
             'distance':['NAN', 'NAN', 'NAN','NAN',56,567,234,70,78,43], 'Covered':['NAN', 'NAN', 'NAN','NAN','NAN','NAN','NAN','67km','80km','56km']} 

# Create DataFrame 
final_df = pd.DataFrame(finaldata) 
final_df

我尝试过使用名为 change_type 的列融化该表,但与我已有的数据集相比,它给了我很大的数据集。请注意,在名为 change_type 的列中,与修改了 退回行名的列相比,那些经过修改的行具有不同的列名。 。简而言之,我的意思是基于名为 change_type 的列,修改的行具有相同的列名,只是它们具有前缀 _old _new diff ,然后添加已退休的行具有相同的列名。仔细观察,您会发现某些日期是相同的:

我尝试使用代码融化:

dff = pd.melt(final_df, id_vars =['date', 'change_type'])

我的最终结果是以一种精巧的方式缩短列甚至行,任何人都可以理解结果,例如只有5列称为 OLD FIELDS NEW FIELDS DIFF_FIELDS 添加的字段退回的字段。预期表应如下所示。.请注意,值与原始表的对齐方式不太好,但我希望按以下方式组织列:

Ouput = {'date':['01/01/2018','04/05/2018' ,'02/02/2019','01/01/2018','04/05/2018' ,'02/02/2019','01/04/2019', '01/01/2018','20/03/2019','01/04/2019'],'modified_olds':[20, 340, 21, 'NANA',99,80,'NAN','NAN','NAN','NAN'], 
                    'modified_new':[20,340, 21, 'NAN', 100, 89, 'NAN','NAN','NAN','NAN'],'diff_fields':[3,6, 96, 'NAN', 1, 9, 'NAN','NAN','NAN','NAN'] ,
             'added_fields':['NAN', 'NAN', 'NAN','NAN',56,567,234,70,78,43],
             'retired_fields':['NAN', 'NAN', 'NAN','NAN','NAN','NAN','NAN','67km','80km','56km']} 

# Create DataFrame 
Output_df = pd.DataFrame(Output) 
Output_df

0 个答案:

没有答案