将Tableau数据CSV输出转换为可感知的熊猫数据框

时间:2018-11-30 00:16:30

标签: python pandas tableau pandas-groupby

我收到了一些来自Tableau工作表的丑陋CSV。我以前只使用过Tableau一次,而且我记得必须先将数据鞭打成一定的形状,然后才能使用视觉效果。数据采用奇怪的格式,类似于创建多索引熊猫数据透视表时发生的情况。我想将从CSV(aka:df1)获得的数据转换为可读格式(df2)。我提供了df1中数据的小例子:

df1 = {'Unnamed: 0':['', '', 'category_id', "action_a", "action_a", "action_b", "action_b"], 'Unnamed: 1':['', '', 'id1', "blue", "blue","blue", "blue"], 'Unnamed: 2':['', '', 'id2', "1", "1","1", "1"], 'Unnamed: 3':['','',  'id3', "2", "2","2", "2"], 'Unnamed: 4':['', '', 'id4',"3", "4","3", "4"], 'Unnamed: 5':['', '', 'combo_id',"blue_1_2_3", "blue_1_2_4","blue_1_2_3", "blue_1_2_4"], 'Unnamed: 6':['', '20181112', '00:00',"0.6", "0.5","0", "4"], 'Unnamed: 7':['', '20181112', '00:15',"1.6", "0.8","4", "10"], 'Unnamed: 8':['', '20181112', '00:30',"1.2", "0.8","2", "2"], 'Unnamed: 9':['', '20181112', '00:45',"0.8", "1.1","0", "2"]}
df1 = pd.DataFrame(data=df1)

df1:
    Unnamed: 0 Unnamed: 1 Unnamed: 2    ...     Unnamed: 7 Unnamed: 8 Unnamed: 9
0                                       ...                                     
1                                       ...       20181112   20181112   20181112
2  category_id        id1        id2    ...          00:15      00:30      00:45
3     action_a       blue          1    ...            1.6        1.2        0.8
4     action_a       blue          1    ...            0.8        0.8        1.1
5     action_b       blue          1    ...              4          2          0
6     action_b       blue          1    ...             10          2          2

最终目标是将df1转换为df2。我在阅读每一行时都会对它进行移调吗?我尝试研究,发现在熊猫中具有“融化”功能以取消数据透视,但是我无法正确使用它。任何帮助/想法,我们将不胜感激。谢谢!

df2 = {'year_date':['20181112','20181112','20181112','20181112','20181112','20181112','20181112','20181112'], 'time_id':['00:00','00:15','00:30','00:45','00:00','00:15','00:30','00:45'], 'color_id':['blue','blue','blue','blue','blue','blue','blue','blue'], 'id_2':['1','1','1','1','1','1','1','1'], 'id3':['2','2','2','2','2','2','2','2'], 'id4':['3','3','3','3','4','4','4','4'], 'combo_id':['blue_1_2_3','blue_1_2_3','blue_1_2_3','blue_1_2_3','blue_1_2_4','blue_1_2_4','blue_1_2_4','blue_1_2_4'], 'action_a':['0.6','1.6','1.2','0.8','0.5','0.8','0.8','1.1'], 'action_b':['0', '4','2','0','4','10','2','2']}
df2 = pd.DataFrame(data=df2)

df2: 
  year_date time_id color_id id_2 id3 id4    combo_id action_a action_b
0   2018112   00:00     blue    1   2   3  blue_1_2_3      0.6        0
1   2018112   00:15     blue    1   2   3  blue_1_2_3      1.6        4
2   2018112   00:30     blue    1   2   3  blue_1_2_3      1.2        2
3   2018112   00:45     blue    1   2   3  blue_1_2_3      0.8        0
4   2018112   00:00     blue    1   2   4  blue_1_2_4      0.5        4
5   2018112   00:15     blue    1   2   4  blue_1_2_4      0.8       10
6   2018112   00:30     blue    1   2   4  blue_1_2_4      0.8        2
7   2018112   00:45     blue    1   2   4  blue_1_2_4      1.1        2

0 个答案:

没有答案