在multiindex / multilevel Pandas DataFrame中重命名级别

时间:2017-08-20 20:18:35

标签: python pandas

我有DataFrame这样:

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
          np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)

df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)

                  m1   m2   m3   m4
bar one 2016-01 -0.0  1.0  3.0  2.0
    two 2016-02  1.0  1.0  1.0  2.0
baz one 2016-01 -1.0 -1.0  2.0  1.0
    two 2016-02  1.0  2.0  1.0  2.0
foo one 2016-01  1.0 -0.0 -0.0 -0.0
    two 2016-02 -2.0 -0.0 -0.0 -0.0
qux one 2016-01 -0.0 -0.0 -1.0  1.0
    two 2016-02 -0.0 -0.0  1.0 -0.0

假设我想在m2和m4的列名中替换2016年的所有2016年,以便2016行的m1和m3值不是m2和m4的值。因此2017行将具有m2和m4的值,但不具有m1和m3的值。与此DataFrame类似的东西:

                  m1   m2   m3   m4
bar one 2016-01 -0.0  0.0  3.0  0.0
    two 2016-02  1.0  0.0  1.0  0.0
    one 2017-01  0.0  1.0  0.0  2.0
    two 2017-02  0.0  1.0  0.0  2.0
baz one 2016-01 -1.0  0.0  2.0  0.0
    two 2016-02  1.0  0.0  1.0  0.0
    one 2017-01  0.0 -1.0  0.0  1.0
    two 2017-02  0.0  2.0  0.0  2.0

我已经尝试unstack()数据框并重命名每个列,但这似乎不起作用,我不确定原因。

df = df.unstack()
df.unstack()['m2'] = df.unstack()['m2'].rename(columns = lambda t: t.replace('2016','2017'))

2 个答案:

答案 0 :(得分:1)

import numpy as np
import pandas as pd
np.random.seed(2017)

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
          np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)

df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)

df2 = df[['m2', 'm4']]
df2.index = pd.MultiIndex.from_arrays(
    [df.index.get_level_values(i) for i in [0,1]]
    + [df.index.get_level_values(-1).str.replace('2016','2017')])

result = pd.concat([df[['m1','m3']], df2], axis=0).fillna(0)
result = result.sort_index(level=[0,2,1])
print(result)

转换

                  m1   m2   m3   m4
bar one 2016-01 -1.0 -0.0  1.0  1.0
    two 2016-02 -0.0 -0.0 -0.0 -0.0
baz one 2016-01  1.0 -0.0 -1.0 -0.0
    two 2016-02 -1.0  1.0  1.0 -0.0
foo one 2016-01 -0.0 -0.0 -1.0 -1.0
    two 2016-02  2.0 -0.0 -0.0 -0.0
qux one 2016-01  1.0  2.0 -0.0  2.0
    two 2016-02  1.0  1.0 -0.0 -0.0

                  m1   m2   m3   m4
bar one 2016-01 -1.0  0.0  1.0  0.0
    two 2016-02 -0.0  0.0 -0.0  0.0
    one 2017-01  0.0 -0.0  0.0  1.0
    two 2017-02  0.0 -0.0  0.0 -0.0
baz one 2016-01  1.0  0.0 -1.0  0.0
    two 2016-02 -1.0  0.0  1.0  0.0
    one 2017-01  0.0 -0.0  0.0 -0.0
    two 2017-02  0.0  1.0  0.0 -0.0
foo one 2016-01 -0.0  0.0 -1.0  0.0
    two 2016-02  2.0  0.0 -0.0  0.0
    one 2017-01  0.0 -0.0  0.0 -1.0
    two 2017-02  0.0 -0.0  0.0 -0.0
qux one 2016-01  1.0  0.0 -0.0  0.0
    two 2016-02  1.0  0.0 -0.0  0.0
    one 2017-01  0.0  2.0  0.0  2.0
    two 2017-02  0.0  1.0  0.0 -0.0

答案 1 :(得分:0)

我不确定我是否理解你的问题,这就是我所做的和输出。

    arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
          np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)

df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
df = df.reset_index()
df['level_2'] = df['level_2'].str.replace("2016","2017")

这给了我输出:

level_0  level_1  level_2  m1  m2  m3  m4

0   bar    one    2017-01 -0.0 -1.0 -0.0 -0.0
1   bar    two    2017-02 -0.0  -1.0 2.0  2.0
2   baz    one    2017-01  -2.0 1.0 -0.0  1.0
3   baz    two    2017-02  -0.0 1.0 -1.0  2.0
4   foo    one    2017-01  1.0  -0.0 -1.0 -0.0
5   foo    two    2017-02  -1.0 -2.0  1.0 -0.0
6   qux    one    2017-01   1.0 1.0  -0.0  1.0
7   qux    two    2017-02   1.0 -1.0  2.0 -1.0

如果你能根据这点告诉我你的期望,我会修改我的答案。