从pandas中的一个工作表构建数据框

时间:2017-03-17 14:32:44

标签: python excel pandas

我在excel中有一个复杂的工作表,我想读入多个pandas.DataFrames

enter image description here

基本上,这里有3个数据帧。一个用于DIRECTION_ADIRECTION_BTOTAL

如何告诉pandas分别阅读每个数据帧?我可以使用iloc来指定边界,但由于我在迭代许多不同的电子表格,因此存在不同位置的风险。

目前,我通过跳过前7行来阅读所有这些列:

tmp_df = pd.read_excel(file,sheetname=sheet_name,skiprows=7)

(Sample data)

1 个答案:

答案 0 :(得分:1)

我认为你不能告诉大熊猫单独阅读这些框架,但在阅读后它们可以很容易地分开。

<强>代码:

def get_multi_frame_excel(*args, **kwargs):
    # read in the frame, with a multi level column index
    multi_frame = pd.read_excel(*args, header=[0, 1], **kwargs)

    # group the data by the top level column index, and store in dict
    frames = {name: group for name, group in
              multi_frame.groupby(level=0, axis=1)}

    # remove the top level index from the frames
    for frame in frames.values():
        frame.columns = frame.columns.droplevel(level=0)

    # return a dict of frames
    return frames

测试代码:

frames = get_multi_frame_excel('SO_split_df.xlsx', skiprows=1)
for name, frame in frames.items():
    print('---')
    print(name)
    print(frame)

<强>结果:

---
DIRECTION_A
Time      A  B  C  D  E  F  G  H
00:00:00  0  0  0  0  0  0  0  0
00:15:00  0  0  0  0  0  0  0  0
00:30:00  0  0  0  0  0  0  0  0
....
09:00:00  3  1  0  0  0  0  1  5
09:15:00  1  0  0  0  0  0  1  2
09:30:00  1  0  0  0  0  0  1  2
---
TOTAL
Time       A  B  C  D  E  F  G   H
00:00:00   1  0  0  0  0  0  0   1
00:15:00   0  0  0  0  0  0  0   0
00:30:00   0  0  0  0  0  0  0   0
....
09:00:00   7  1  0  0  0  0  1   9
09:15:00   4  0  0  0  0  0  3   7
09:30:00   3  0  0  0  0  0  1   4
---
DIRECTION_B
Time       A  B  C  D  E  F  G   H
00:00:00   1  0  0  0  0  0  0   1
00:15:00   0  0  0  0  0  0  0   0
00:30:00   0  0  0  0  0  0  0   0
....
09:00:00   4  0  0  0  0  0  0   4
09:15:00   3  0  0  0  0  0  2   5
09:30:00   2  0  0  0  0  0  0   2