如何将熊猫中的多标头Excel转换为简单表

时间:2019-03-21 22:51:48

标签: python pandas dataframe

我具有以下格式的Excel数据

Time A  Time B                              NAME A          NAME B          NAME C
                                            Type A          Type B          Type C
                                            Celcius         Meters          Kgs
2019-03-01 00:00:00 2019-02-28 23:59:55.560 8.0285          410.1051        410.5469
2019-03-01 00:00:10 2019-03-01 00:00:05.776 8.0439          410.1051        410.5938
2019-03-01 00:00:20 2019-03-01 00:00:14.995 8.0439          410.2134        410.6875
2019-03-01 00:00:30 2019-03-01 00:00:25.226 8.0439          410.0781        410.5469
2019-03-01 00:00:40 2019-03-01 00:00:35.444 8.0285          410.0239        410.5312
2019-03-01 00:00:50 2019-03-01 00:00:45.676 8.0439          410.1592        410.609

我想将以下数据转换为熊猫数据框

Time A, Time B, Name , Type , Unit , Value 

我尝试了以下代码

import pandas as pd
xl = pd.ExcelFile('testx.xlsm')
df = xl.parse(xl.sheet_names[0])
df1 =  df.set_index(['Time A', 'Time B'])
df1.columns = [df1.columns,df1.iloc[0], df1.iloc[1]]
df1 = df1.iloc[2:].reset_index(drop=False)
df1.unstack(level=-1)

我尝试了下面的代码,并获得了一些更好的东西,但占用大量内存。

xl = pd.ExcelFile('test2.xlsm', )
df = xl.parse(xl.sheet_names[0],index_col=[0,1], header=[0,1,2] )
df1 = df.stack().stack().stack()

预期结果是这样

Time A              Time B                      name        Type            Unit                        Value
2019-03-01 00:00:00 2019-02-28 23:59:55.560     NAME A      Type A          Celcius                     8.0285          
                                                NAME B      Type B          Meters                      410.1051        
                                                NAME C      Type C          Kgs                         410.5469

2 个答案:

答案 0 :(得分:0)

我认为这应该可以帮助您

import pandas as pd

arrays = [['Time A', 'Time B', 'NAME A ', 'NAME B','NAME C'], ['', '', 'Type A','Type B','Type C'], ['', '', 'Celcius','Meters','Kgs']]

df.columns = pd.MultiIndex.from_arrays(arrays)
df

假设您当前的数据框已经具有Excel数据(不包含标题),则输出应为:

        Time A              Time B                 NAME A   NAME B      NAME C
                                                   Type A   Type B      Type C
                                                   Celcius  Meters      Kgs
  0     2019-03-0100:00:00  2019-02-2823:59:55.560  8.0285  410.1051    410.5469

答案 1 :(得分:0)

找到的另一个有效解决方案是

# Generate Data Frame
def load_file_in_df(fileName, filePath):
    logging.info("Loading file : "+fileName)

    if os.path.isfile(filePath +fileName):
        obj_xl = pd.ExcelFile(filePath + fileName )
        df_excel = obj_xl.parse(obj_xl.sheet_names[0],index_col=[0,1], header=[0,1,2] )
    else:
        print("File does not exists: " +filePath + fileName)
    return  df_excel   

# Parse Dataframe 
def parse_10sec_df(df_excel):   
    rows, cols = df_excel.shape
    l_excel = []
    for row in df_excel.itertuples():
        for i in range(cols):
            l = []
            l.append  (row[0][0])
            l.append  (row[0][1] )

            l.append (df_excel.columns.values[i][0])
            l.append (df_excel.columns.values[i][1])
            l.append (df_excel.columns.values[i][2])
            l.append  (row[i+1] )
            l_excel.append(tuple(l))
            #print row[i]
    return l_excel

Above will produce a tuple with below data.

Time A              Time B                      name        Type            Unit                        Value
2019-03-01 00:00:00 2019-02-28 23:59:55.560     NAME A      Type A          Celcius                     8.0285          
                                                NAME B      Type B          Meters                      410.1051        
                                                NAME C      Type C          Kgs                         410.5469