我收到了一些来自Tableau工作表的丑陋CSV。我以前只使用过Tableau一次,而且我记得必须先将数据鞭打成一定的形状,然后才能使用视觉效果。数据采用奇怪的格式,类似于创建多索引熊猫数据透视表时发生的情况。我想将从CSV(aka:df1)获得的数据转换为可读格式(df2)。我提供了df1中数据的小例子:
df1 = {'Unnamed: 0':['', '', 'category_id', "action_a", "action_a", "action_b", "action_b"], 'Unnamed: 1':['', '', 'id1', "blue", "blue","blue", "blue"], 'Unnamed: 2':['', '', 'id2', "1", "1","1", "1"], 'Unnamed: 3':['','', 'id3', "2", "2","2", "2"], 'Unnamed: 4':['', '', 'id4',"3", "4","3", "4"], 'Unnamed: 5':['', '', 'combo_id',"blue_1_2_3", "blue_1_2_4","blue_1_2_3", "blue_1_2_4"], 'Unnamed: 6':['', '20181112', '00:00',"0.6", "0.5","0", "4"], 'Unnamed: 7':['', '20181112', '00:15',"1.6", "0.8","4", "10"], 'Unnamed: 8':['', '20181112', '00:30',"1.2", "0.8","2", "2"], 'Unnamed: 9':['', '20181112', '00:45',"0.8", "1.1","0", "2"]}
df1 = pd.DataFrame(data=df1)
df1:
Unnamed: 0 Unnamed: 1 Unnamed: 2 ... Unnamed: 7 Unnamed: 8 Unnamed: 9
0 ...
1 ... 20181112 20181112 20181112
2 category_id id1 id2 ... 00:15 00:30 00:45
3 action_a blue 1 ... 1.6 1.2 0.8
4 action_a blue 1 ... 0.8 0.8 1.1
5 action_b blue 1 ... 4 2 0
6 action_b blue 1 ... 10 2 2
最终目标是将df1转换为df2。我在阅读每一行时都会对它进行移调吗?我尝试研究,发现在熊猫中具有“融化”功能以取消数据透视,但是我无法正确使用它。任何帮助/想法,我们将不胜感激。谢谢!
df2 = {'year_date':['20181112','20181112','20181112','20181112','20181112','20181112','20181112','20181112'], 'time_id':['00:00','00:15','00:30','00:45','00:00','00:15','00:30','00:45'], 'color_id':['blue','blue','blue','blue','blue','blue','blue','blue'], 'id_2':['1','1','1','1','1','1','1','1'], 'id3':['2','2','2','2','2','2','2','2'], 'id4':['3','3','3','3','4','4','4','4'], 'combo_id':['blue_1_2_3','blue_1_2_3','blue_1_2_3','blue_1_2_3','blue_1_2_4','blue_1_2_4','blue_1_2_4','blue_1_2_4'], 'action_a':['0.6','1.6','1.2','0.8','0.5','0.8','0.8','1.1'], 'action_b':['0', '4','2','0','4','10','2','2']}
df2 = pd.DataFrame(data=df2)
df2:
year_date time_id color_id id_2 id3 id4 combo_id action_a action_b
0 2018112 00:00 blue 1 2 3 blue_1_2_3 0.6 0
1 2018112 00:15 blue 1 2 3 blue_1_2_3 1.6 4
2 2018112 00:30 blue 1 2 3 blue_1_2_3 1.2 2
3 2018112 00:45 blue 1 2 3 blue_1_2_3 0.8 0
4 2018112 00:00 blue 1 2 4 blue_1_2_4 0.5 4
5 2018112 00:15 blue 1 2 4 blue_1_2_4 0.8 10
6 2018112 00:30 blue 1 2 4 blue_1_2_4 0.8 2
7 2018112 00:45 blue 1 2 4 blue_1_2_4 1.1 2