MultiIndex的Pandas Dataframe切片有太多级别

时间:2015-04-18 12:09:27

标签: python pandas

我的数据框是从较大的数据框切片而来的:

df
Out[47]: 
                           price  log_price  dlog_price
data_source_id trade_date                              
1              2014-03-05  174.4   5.161352   -2.089993

如您所见,数据框中有1行。

但是,索引有数千个级别,因为这些级别似乎仍然来自父级:

df.index
Out[48]: 
MultiIndex(levels=[[1, 2, 4, 5, 6, 7, 8, 9], [1990-01-01 00:00:00, 1990-01-02 00:00:00, 1990-01-03 00:00:00, 1990-01-04 00:00:00, 1990-01-05 00:00:00, 1990-01-08 00:00:00, 1990-01-09 00:00:00, 1990-01-10 00:00:00, 1990-01-11 00:00:00, 1990-01-12 00:00:00, 1990-01-15 00:00:00, 1990-01-16 00:00:00, 1990-01-17 00:00:00, 1990-01-18 00:00:00, 1990-01-19 00:00:00, 1990-01-22 00:00:00, 1990-01-23 00:00:00, 1990-01-24 00:00:00, 1990-01-25 00:00:00, 1990-01-26 00:00:00, 1990-01-29 00:00:00, 1990-01-30 00:00:00, 1990-01-31 00:00:00, 1990-02-01 00:00:00, 1990-02-02 00:00:00, 1990-02-05 00:00:00, 1990-02-06 00:00:00, 1990-02-07 00:00:00, 1990-02-08 00:00:00, 1990-02-09 00:00:00, 1990-02-12 00:00:00, 1990-02-13 00:00:00, 1990-02-14 00:00:00, 1990-02-15 00:00:00, 1990-02-16 00:00:00, 1990-02-19 00:00:00, 1990-02-20 00:00:00, 1990-02-21 00:00:00, 1990-02-22 00:00:00, 1990-02-23 00:00:00, 1990-02-26 00:00:00, 1990-02-27 00:00:00, 1990-02-28 00:00:00, 1990-03-01 00:00:00, 1990-03-02 00:00:00, 1990-03-05 00:00:00, 1990-03-06 00:00:00, 1990-03-07 00:00:00, 1990-03-08 00:00:00, 1990-03-09 00:00:00, 1990-03-12 00:00:00, 1990-03-13 00:00:00, 1990-03-14 00:00:00, 1990-03-15 00:00:00, 1990-03-16 00:00:00, 1990-03-19 00:00:00, 1990-03-20 00:00:00, 1990-03-21 00:00:00, 1990-03-22 00:00:00, 1990-03-23 00:00:00, 1990-03-26 00:00:00, 1990-03-27 00:00:00, 1990-03-28 00:00:00, 1990-03-29 00:00:00, 1990-03-30 00:00:00, 1990-04-02 00:00:00, 1990-04-03 00:00:00, 1990-04-04 00:00:00, 1990-04-05 00:00:00, 1990-04-06 00:00:00, 1990-04-09 00:00:00, 1990-04-10 00:00:00, 1990-04-11 00:00:00, 1990-04-12 00:00:00, 1990-04-13 00:00:00, 1990-04-16 00:00:00, 1990-04-17 00:00:00, 1990-04-18 00:00:00, 1990-04-19 00:00:00, 1990-04-20 00:00:00, 1990-04-23 00:00:00, 1990-04-24 00:00:00, 1990-04-25 00:00:00, 1990-04-26 00:00:00, 1990-04-27 00:00:00, 1990-04-30 00:00:00, 1990-05-01 00:00:00, 1990-05-02 00:00:00, 1990-05-03 00:00:00, 1990-05-04 00:00:00, 1990-05-07 00:00:00, 1990-05-08 00:00:00, 1990-05-09 00:00:00, 1990-05-10 00:00:00, 1990-05-11 00:00:00, 1990-05-14 00:00:00, 1990-05-15 00:00:00, 1990-05-16 00:00:00, 1990-05-17 00:00:00, 1990-05-18 00:00:00, ...]],
           labels=[[0], [6308]],
           names=['data_source_id', 'trade_date'])

如何清理多索引,以便它没有那么多级别?

这似乎有效,但有点乱:

df2 = df.reset_index().set_index( df.index.names )

df2.index
Out[53]: 
MultiIndex(levels=[[1], [2014-03-05 00:00:00]],
           labels=[[0], [0]],
           names=['data_source_id', 'trade_date'])

1 个答案:

答案 0 :(得分:0)

你可以这样做:

df.index = pd.MultiIndex.from_tuples(df.index.values, names=df.index.names)

或者:

>>> arr = list(map(df.index.get_level_values, range(df.index.nlevels)))
>>> df.index = pd.MultiIndex.from_arrays(arr)