Question

我使用以下代码创建具有多索引的Pandas DataFrame并更新一些值。

indices = pd.MultiIndex.from_product(iterables=[['X','Y','Z'],['h1','h2','h3']],names=['idx1','idx2'])
df = pd.DataFrame(0.0,index=indices,columns=['A'])

df.loc['X','A']['h1'] = 1.1
df.loc['Y','A']['h1'] = 2.2
df.loc['Z','A']['h1'] = 3.3

print(df)

此代码按预期生成以下输出：

             A
idx1 idx2
X    h1    1.1
     h2    0.0
     h3    0.0
Y    h1    2.2
     h2    0.0
     h3    0.0
Z    h1    3.3
     h2    0.0
     h3    0.0

但是当我使用以下代码（请注意，第一个索引中的“ Y”在末尾移动）时，输出错误：

import pandas as pd
indices = pd.MultiIndex.from_product(iterables=[['X','Z','Y'],['h1','h2','h3']],names=['idx1','idx2'])
df = pd.DataFrame(0.0,index=indices,columns=['A'])

df.loc['X','A']['h1'] = 1.1
df.loc['Y','A']['h1'] = 2.2
df.loc['Z','A']['h1'] = 3.3

print(df)

             A
idx1 idx2
X    h1    0.0
     h2    0.0
     h3    0.0
Z    h1    0.0
     h2    0.0
     h3    0.0
Y    h1    0.0
     h2    0.0
     h3    0.0

第二个代码出了什么问题？我做的事情确实很愚蠢，还是某种预期的行为？

我尝试了python 3.7.1和3.6.8，在两种情况下都使用了熊猫0.23.4。

Answer 1

使用元组作为MultiIndex中的设置值，以避免chained indexing：

df.loc[('X','h1'),'A'] = 1.1
df.loc[('Y','h1'),'A'] = 2.2
df.loc[('Z','h1'),'A'] = 3.3

print(df)
             A
idx1 idx2     
X    h1    1.1
     h2    0.0
     h3    0.0
Z    h1    3.3
     h2    0.0
     h3    0.0
Y    h1    2.2
     h2    0.0
     h3    0.0

使用多索引时非常奇怪的熊猫行为

1 个答案: