如何在不展平MultiIndex的情况下向pandas DataFrame添加行

时间:2017-07-06 13:11:51

标签: python pandas dataframe

我无法以有效的方式向MultiIndexed DataFrame添加单行。通过添加行,MultiIndex被展平为简单的元组索引。奇怪的是,这对于MultiIndexed列来说不是问题。

系统信息:

Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 19:25:17) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.19.2'

示例数据:包含MultiIndex行和列

的DataFrame
import numpy as np
import pandas as pd

index = pd.MultiIndex(levels=[['bar', 'foo'], ['one', 'two']],
                      labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                      names=['row_0', 'row_1'])
columns = pd.MultiIndex(levels=[['dull', 'shiny'], ['a', 'b']],
                      labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                      names=['col_0', 'col_1'])
df = pd.DataFrame(np.ones((4,4)),columns=columns, index=index)

print(df)

    col_0       dull      shiny     
col_1          a    b     a    b
row_0 row_1                     
bar   one    1.0  1.0   1.0  1.0
      two    1.0  1.0   1.0  1.0
foo   one    1.0  1.0   1.0  1.0
      two    1.0  1.0   1.0  1.0

向DataFrame添加其他列没有问题:

df['last_col'] = 42 #define a new column and assign a value

print(df)

col_0       dull      shiny      last_col
col_1          a    b     a    b         
row_0 row_1                              
bar   one    1.0  1.0   1.0  1.0       42
      two    1.0  1.0   1.0  1.0       42
foo   one    1.0  1.0   1.0  1.0       42
      two    1.0  1.0   1.0  1.0       42

但是,如果我为添加行(使用loc)执行相同操作,则MultiIndex将展平为 简单的元组索引:

df.loc['last_row'] = 43  #define a new row and assign a value

print(df)

col_0       dull       shiny       last_col
col_1          a     b     a     b         
(bar, one)   1.0   1.0   1.0   1.0       42
(bar, two)   1.0   1.0   1.0   1.0       42
(foo, one)   1.0   1.0   1.0   1.0       42
(foo, two)   1.0   1.0   1.0   1.0       42
last_row    43.0  43.0  43.0  43.0       43

有没有人知道如何以简单有效的方式添加行而不展平索引?非常感谢!!

1 个答案:

答案 0 :(得分:2)

我认为你需要使用元组来定义MultiIndex

的两个值
df.loc[('last_row', 'a'), :] = 43
print(df)
col_0           dull       shiny      
col_1              a     b     a     b
row_0    row_1                        
bar      one     1.0   1.0   1.0   1.0
         two     1.0   1.0   1.0   1.0
foo      one     1.0   1.0   1.0   1.0
         two     1.0   1.0   1.0   1.0
last_row a      43.0  43.0  43.0  43.0

对于列,它的工作方式类似:

df[('last_col', 'a')] = 43
print(df)
col_0       dull      shiny      last_col
col_1          a    b     a    b        a
row_0 row_1                              
bar   one    1.0  1.0   1.0  1.0       43
      two    1.0  1.0   1.0  1.0       43
foo   one    1.0  1.0   1.0  1.0       43
      two    1.0  1.0   1.0  1.0       43

编辑:

似乎您需要定义列名称,如果需要全部使用:

df.loc['last_row',:] = 43
print(df)
col_0           dull       shiny      
col_1              a     b     a     b
row_0    row_1                        
bar      one     1.0   1.0   1.0   1.0
         two     1.0   1.0   1.0   1.0
foo      one     1.0   1.0   1.0   1.0
         two     1.0   1.0   1.0   1.0
last_row        43.0  43.0  43.0  43.0

如果未定义level,则添加空字符串:

print(df.index)
MultiIndex(levels=[['bar', 'foo', 'last_row'], ['one', 'two', '']],
           labels=[[0, 0, 1, 1, 2], [0, 1, 0, 1, 2]],
           names=['row_0', 'row_1'])
df.loc['last_row','dull'] = 43
print(df)
col_0           dull       shiny     
col_1              a     b     a    b
row_0    row_1                       
bar      one     1.0   1.0   1.0  1.0
         two     1.0   1.0   1.0  1.0
foo      one     1.0   1.0   1.0  1.0
         two     1.0   1.0   1.0  1.0
last_row        43.0  43.0   NaN  NaN
df.loc['last_row', ('dull', 'a')] = 43
print(df)
col_0           dull      shiny     
col_1              a    b     a    b
row_0    row_1                      
bar      one     1.0  1.0   1.0  1.0
         two     1.0  1.0   1.0  1.0
foo      one     1.0  1.0   1.0  1.0
         two     1.0  1.0   1.0  1.0
last_row        43.0  NaN   NaN  NaN