我在excel中有一个复杂的工作表,我想读入多个pandas.DataFrames
。
基本上,这里有3个数据帧。一个用于DIRECTION_A
,DIRECTION_B
和TOTAL
。
如何告诉pandas分别阅读每个数据帧?我可以使用iloc
来指定边界,但由于我在迭代许多不同的电子表格,因此存在不同位置的风险。
目前,我通过跳过前7行来阅读所有这些列:
tmp_df = pd.read_excel(file,sheetname=sheet_name,skiprows=7)
答案 0 :(得分:1)
我认为你不能告诉大熊猫单独阅读这些框架,但在阅读后它们可以很容易地分开。
<强>代码:强>
def get_multi_frame_excel(*args, **kwargs):
# read in the frame, with a multi level column index
multi_frame = pd.read_excel(*args, header=[0, 1], **kwargs)
# group the data by the top level column index, and store in dict
frames = {name: group for name, group in
multi_frame.groupby(level=0, axis=1)}
# remove the top level index from the frames
for frame in frames.values():
frame.columns = frame.columns.droplevel(level=0)
# return a dict of frames
return frames
测试代码:
frames = get_multi_frame_excel('SO_split_df.xlsx', skiprows=1)
for name, frame in frames.items():
print('---')
print(name)
print(frame)
<强>结果:强>
---
DIRECTION_A
Time A B C D E F G H
00:00:00 0 0 0 0 0 0 0 0
00:15:00 0 0 0 0 0 0 0 0
00:30:00 0 0 0 0 0 0 0 0
....
09:00:00 3 1 0 0 0 0 1 5
09:15:00 1 0 0 0 0 0 1 2
09:30:00 1 0 0 0 0 0 1 2
---
TOTAL
Time A B C D E F G H
00:00:00 1 0 0 0 0 0 0 1
00:15:00 0 0 0 0 0 0 0 0
00:30:00 0 0 0 0 0 0 0 0
....
09:00:00 7 1 0 0 0 0 1 9
09:15:00 4 0 0 0 0 0 3 7
09:30:00 3 0 0 0 0 0 1 4
---
DIRECTION_B
Time A B C D E F G H
00:00:00 1 0 0 0 0 0 0 1
00:15:00 0 0 0 0 0 0 0 0
00:30:00 0 0 0 0 0 0 0 0
....
09:00:00 4 0 0 0 0 0 0 4
09:15:00 3 0 0 0 0 0 2 5
09:30:00 2 0 0 0 0 0 0 2