替换pandas多索引中的值

时间:2016-03-20 08:39:20

标签: python pandas dataframe multi-index

我有一个带有多索引的数据框。我想在满足第一个索引的某些条件时更改第二个索引的值。 我在这里找到了一个类似(但不同)的问题:Replace a value in MultiIndex (pandas) 这没有回答我的观点,因为那是关于改变单行,并且解决方案也传递了第一个索引的值(不需要更改)。在我的情况下,我正在处理多行,我无法使该解决方案适应我的情况。

我的数据的最小示例如下。谢谢!

import pandas as pd
import numpy as np

consdf=pd.DataFrame()

for mylocation in ['North','South']:
    for scenario in np.arange(1,4):
        df= pd.DataFrame()
        df['mylocation'] = [mylocation]
        df['scenario']= [scenario]
        df['this'] = np.random.randint(10,100)
        df['that'] = df['this']  * 2
        df['something else']  = df['this'] * 3
        consdf=pd.concat((consdf, df ), axis=0, ignore_index=True)

mypiv = consdf.pivot('mylocation','scenario').transpose()

level_list =['this','that']
# if level 0 is in level_list --> set level 1 to np.nan
mypiv.iloc[mypiv.index.get_level_values(0).isin(level_list)].index.set_levels([np.nan], level =1, inplace=True)

最后一行不起作用:我得到:

ValueError: On level 1, label max (2) >= length of level  (1). NOTE: this index is in an inconsistent state

2 个答案:

答案 0 :(得分:1)

注意: ix已在Pandas 0.20+中弃用。请改用loc访问者。

以下是使用reset_index()方法的解决方案:

In [95]: new = mypiv.reset_index()

In [96]: new
Out[96]:
mylocation         level_0  scenario  North  South
0                     this         1     32     64
1                     this         2     18     40
2                     this         3     76     56
3                     that         1     64    128
4                     that         2     36     80
5                     that         3    152    112
6           something else         1     96    192
7           something else         2     54    120
8           something else         3    228    168

In [100]: new.ix[new.level_0.isin(level_list), 'scenario'] = np.nan

In [101]: new
Out[101]:
mylocation         level_0  scenario  North  South
0                     this       NaN     32     64
1                     this       NaN     18     40
2                     this       NaN     76     56
3                     that       NaN     64    128
4                     that       NaN     36     80
5                     that       NaN    152    112
6           something else       1.0     96    192
7           something else       2.0     54    120
8           something else       3.0    228    168

In [103]: mypiv = new.set_index(['level_0', 'scenario'])

In [104]: mypiv
Out[104]:
mylocation               North  South
level_0        scenario
this           NaN          32     64
               NaN          18     40
               NaN          76     56
that           NaN          64    128
               NaN          36     80
               NaN         152    112
something else 1.0          96    192
               2.0          54    120
               3.0         228    168

但我怀疑有更优雅的解决方案。

答案 1 :(得分:1)

IIUC您可以为级别值添加新值,然后使用advanced indexingget_level_valuesset_levelsset_labels方法更改索引的标签:

len_ind = len(mypiv.loc[(level_list,)].index.get_level_values(1))
mypiv.index.set_levels([1, 2, 3, np.nan], level=1, inplace=True)
mypiv.index.set_labels([3]*len_ind + mypiv.index.labels[1][len_ind:].tolist(), level=1, inplace=True)

In [219]: mypiv
Out[219]: 
mylocation               North  South
               scenario              
this           NaN          26     46
               NaN          32     67
               NaN          75     30
that           NaN          52     92
               NaN          64    134
               NaN         150     60
something else  1.0         78    138
                2.0         96    201
                3.0        225     90

注意其他scenario的值将转换为float,因为它应该是一种类型而np.nan具有浮点类型。