我具有以下格式的Excel数据
Time A Time B NAME A NAME B NAME C
Type A Type B Type C
Celcius Meters Kgs
2019-03-01 00:00:00 2019-02-28 23:59:55.560 8.0285 410.1051 410.5469
2019-03-01 00:00:10 2019-03-01 00:00:05.776 8.0439 410.1051 410.5938
2019-03-01 00:00:20 2019-03-01 00:00:14.995 8.0439 410.2134 410.6875
2019-03-01 00:00:30 2019-03-01 00:00:25.226 8.0439 410.0781 410.5469
2019-03-01 00:00:40 2019-03-01 00:00:35.444 8.0285 410.0239 410.5312
2019-03-01 00:00:50 2019-03-01 00:00:45.676 8.0439 410.1592 410.609
我想将以下数据转换为熊猫数据框
Time A, Time B, Name , Type , Unit , Value
我尝试了以下代码
import pandas as pd
xl = pd.ExcelFile('testx.xlsm')
df = xl.parse(xl.sheet_names[0])
df1 = df.set_index(['Time A', 'Time B'])
df1.columns = [df1.columns,df1.iloc[0], df1.iloc[1]]
df1 = df1.iloc[2:].reset_index(drop=False)
df1.unstack(level=-1)
我尝试了下面的代码,并获得了一些更好的东西,但占用大量内存。
xl = pd.ExcelFile('test2.xlsm', )
df = xl.parse(xl.sheet_names[0],index_col=[0,1], header=[0,1,2] )
df1 = df.stack().stack().stack()
预期结果是这样
Time A Time B name Type Unit Value
2019-03-01 00:00:00 2019-02-28 23:59:55.560 NAME A Type A Celcius 8.0285
NAME B Type B Meters 410.1051
NAME C Type C Kgs 410.5469
答案 0 :(得分:0)
我认为这应该可以帮助您
import pandas as pd
arrays = [['Time A', 'Time B', 'NAME A ', 'NAME B','NAME C'], ['', '', 'Type A','Type B','Type C'], ['', '', 'Celcius','Meters','Kgs']]
df.columns = pd.MultiIndex.from_arrays(arrays)
df
假设您当前的数据框已经具有Excel数据(不包含标题),则输出应为:
Time A Time B NAME A NAME B NAME C
Type A Type B Type C
Celcius Meters Kgs
0 2019-03-0100:00:00 2019-02-2823:59:55.560 8.0285 410.1051 410.5469
答案 1 :(得分:0)
找到的另一个有效解决方案是
# Generate Data Frame
def load_file_in_df(fileName, filePath):
logging.info("Loading file : "+fileName)
if os.path.isfile(filePath +fileName):
obj_xl = pd.ExcelFile(filePath + fileName )
df_excel = obj_xl.parse(obj_xl.sheet_names[0],index_col=[0,1], header=[0,1,2] )
else:
print("File does not exists: " +filePath + fileName)
return df_excel
# Parse Dataframe
def parse_10sec_df(df_excel):
rows, cols = df_excel.shape
l_excel = []
for row in df_excel.itertuples():
for i in range(cols):
l = []
l.append (row[0][0])
l.append (row[0][1] )
l.append (df_excel.columns.values[i][0])
l.append (df_excel.columns.values[i][1])
l.append (df_excel.columns.values[i][2])
l.append (row[i+1] )
l_excel.append(tuple(l))
#print row[i]
return l_excel
Above will produce a tuple with below data.
Time A Time B name Type Unit Value
2019-03-01 00:00:00 2019-02-28 23:59:55.560 NAME A Type A Celcius 8.0285
NAME B Type B Meters 410.1051
NAME C Type C Kgs 410.5469