我的数据框是从较大的数据框切片而来的:
df
Out[47]:
price log_price dlog_price
data_source_id trade_date
1 2014-03-05 174.4 5.161352 -2.089993
如您所见,数据框中有1行。
但是,索引有数千个级别,因为这些级别似乎仍然来自父级:
df.index
Out[48]:
MultiIndex(levels=[[1, 2, 4, 5, 6, 7, 8, 9], [1990-01-01 00:00:00, 1990-01-02 00:00:00, 1990-01-03 00:00:00, 1990-01-04 00:00:00, 1990-01-05 00:00:00, 1990-01-08 00:00:00, 1990-01-09 00:00:00, 1990-01-10 00:00:00, 1990-01-11 00:00:00, 1990-01-12 00:00:00, 1990-01-15 00:00:00, 1990-01-16 00:00:00, 1990-01-17 00:00:00, 1990-01-18 00:00:00, 1990-01-19 00:00:00, 1990-01-22 00:00:00, 1990-01-23 00:00:00, 1990-01-24 00:00:00, 1990-01-25 00:00:00, 1990-01-26 00:00:00, 1990-01-29 00:00:00, 1990-01-30 00:00:00, 1990-01-31 00:00:00, 1990-02-01 00:00:00, 1990-02-02 00:00:00, 1990-02-05 00:00:00, 1990-02-06 00:00:00, 1990-02-07 00:00:00, 1990-02-08 00:00:00, 1990-02-09 00:00:00, 1990-02-12 00:00:00, 1990-02-13 00:00:00, 1990-02-14 00:00:00, 1990-02-15 00:00:00, 1990-02-16 00:00:00, 1990-02-19 00:00:00, 1990-02-20 00:00:00, 1990-02-21 00:00:00, 1990-02-22 00:00:00, 1990-02-23 00:00:00, 1990-02-26 00:00:00, 1990-02-27 00:00:00, 1990-02-28 00:00:00, 1990-03-01 00:00:00, 1990-03-02 00:00:00, 1990-03-05 00:00:00, 1990-03-06 00:00:00, 1990-03-07 00:00:00, 1990-03-08 00:00:00, 1990-03-09 00:00:00, 1990-03-12 00:00:00, 1990-03-13 00:00:00, 1990-03-14 00:00:00, 1990-03-15 00:00:00, 1990-03-16 00:00:00, 1990-03-19 00:00:00, 1990-03-20 00:00:00, 1990-03-21 00:00:00, 1990-03-22 00:00:00, 1990-03-23 00:00:00, 1990-03-26 00:00:00, 1990-03-27 00:00:00, 1990-03-28 00:00:00, 1990-03-29 00:00:00, 1990-03-30 00:00:00, 1990-04-02 00:00:00, 1990-04-03 00:00:00, 1990-04-04 00:00:00, 1990-04-05 00:00:00, 1990-04-06 00:00:00, 1990-04-09 00:00:00, 1990-04-10 00:00:00, 1990-04-11 00:00:00, 1990-04-12 00:00:00, 1990-04-13 00:00:00, 1990-04-16 00:00:00, 1990-04-17 00:00:00, 1990-04-18 00:00:00, 1990-04-19 00:00:00, 1990-04-20 00:00:00, 1990-04-23 00:00:00, 1990-04-24 00:00:00, 1990-04-25 00:00:00, 1990-04-26 00:00:00, 1990-04-27 00:00:00, 1990-04-30 00:00:00, 1990-05-01 00:00:00, 1990-05-02 00:00:00, 1990-05-03 00:00:00, 1990-05-04 00:00:00, 1990-05-07 00:00:00, 1990-05-08 00:00:00, 1990-05-09 00:00:00, 1990-05-10 00:00:00, 1990-05-11 00:00:00, 1990-05-14 00:00:00, 1990-05-15 00:00:00, 1990-05-16 00:00:00, 1990-05-17 00:00:00, 1990-05-18 00:00:00, ...]],
labels=[[0], [6308]],
names=['data_source_id', 'trade_date'])
如何清理多索引,以便它没有那么多级别?
这似乎有效,但有点乱:
df2 = df.reset_index().set_index( df.index.names )
df2.index
Out[53]:
MultiIndex(levels=[[1], [2014-03-05 00:00:00]],
labels=[[0], [0]],
names=['data_source_id', 'trade_date'])
答案 0 :(得分:0)
你可以这样做:
df.index = pd.MultiIndex.from_tuples(df.index.values, names=df.index.names)
或者:
>>> arr = list(map(df.index.get_level_values, range(df.index.nlevels)))
>>> df.index = pd.MultiIndex.from_arrays(arr)