我确实有一个非常长的表格,我想根据给定的列将其转换为较短的表格。该表如下所示:
finaldata = {'date':['01/01/2018','04/05/2018' ,'02/02/2019','01/01/2018','02/02/2019','01/04/2019','01/04/2019', '01/01/2018','20/03/2019','01/04/2019'],'change_type':['modified', 'modified', 'modified', 'added','added','added','added','retired','retired','retired'],
'Age_old':[20,340, 21, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'],'Age_new':[23,346, 217, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'] ,
'diff_age':[3,6, 96, 'NAN', 'NAN', 'NAN', 'NAN','NAN','NAN','NAN'],
'miles_old':['NAN', 99, 80, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],'miles_new':['NAN', 100, 89, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],
'diff_miles':['NAN', 1, 9, 'NAN','NAN','NAN','NAN','NAN','NAN','NAN'],
'distance':['NAN', 'NAN', 'NAN','NAN',56,567,234,70,78,43], 'Covered':['NAN', 'NAN', 'NAN','NAN','NAN','NAN','NAN','67km','80km','56km']}
# Create DataFrame
final_df = pd.DataFrame(finaldata)
final_df
我尝试过使用名为 change_type 的列融化该表,但与我已有的数据集相比,它给了我很大的数据集。请注意,在名为 change_type 的列中,与修改了 和退回行名的列相比,那些经过修改的行具有不同的列名。 。简而言之,我的意思是基于名为 change_type 的列,修改的行具有相同的列名,只是它们具有前缀 _old 和 _new 和 diff ,然后添加和已退休的行具有相同的列名。仔细观察,您会发现某些日期是相同的:
我尝试使用代码融化:
dff = pd.melt(final_df, id_vars =['date', 'change_type'])
我的最终结果是以一种精巧的方式缩短列甚至行,任何人都可以理解结果,例如只有5列称为 OLD FIELDS , NEW FIELDS , DIFF_FIELDS ,添加的字段和退回的字段。预期表应如下所示。.请注意,值与原始表的对齐方式不太好,但我希望按以下方式组织列:
Ouput = {'date':['01/01/2018','04/05/2018' ,'02/02/2019','01/01/2018','04/05/2018' ,'02/02/2019','01/04/2019', '01/01/2018','20/03/2019','01/04/2019'],'modified_olds':[20, 340, 21, 'NANA',99,80,'NAN','NAN','NAN','NAN'],
'modified_new':[20,340, 21, 'NAN', 100, 89, 'NAN','NAN','NAN','NAN'],'diff_fields':[3,6, 96, 'NAN', 1, 9, 'NAN','NAN','NAN','NAN'] ,
'added_fields':['NAN', 'NAN', 'NAN','NAN',56,567,234,70,78,43],
'retired_fields':['NAN', 'NAN', 'NAN','NAN','NAN','NAN','NAN','67km','80km','56km']}
# Create DataFrame
Output_df = pd.DataFrame(Output)
Output_df