Question

对于具有多级索引的数据帧，我遇到了.loc的行为，我无法解释。

设置：

use quickcheck::{quickcheck, Arbitrary, Gen};

到目前为止，一切都很好，看起来像（请注意已插入索引为5的行）：

import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
                   'DT': [2018, 2018, 2017, 2018],
                   'F1': [0, 1, 0, 0],
                   'F2': [0, 0, 1, 0]  })

df.loc[5]= [5, 2019, 1, 0]
df

现在创建一个在“ ID”和“ DT”上具有多级索引的副本，并将其与loc一起使用：

   ID    DT  F1  F2
0   1  2018   0   0
1   2  2018   1   0
2   3  2017   0   1
3   4  2018   0   0
5   5  2019   1   0

这仍然有效，并输出与给定索引值相对应的值：

indexed= df.set_index(['ID', 'DT'], inplace=False)
indexed.loc[(2, 2018)]

也可以使用以下方式进行更新：

F1    1
F2    0
Name: (2, 2018), dtype: int64

现在尝试以与上面在单级索引上相同的方式插入新行：

indexed.loc[(2, 2018)]= [1, 4]

这引发了一个异常：

indexed.loc[(1, 2019)]= [3, 4]

并且数据框已更改，好像loc访问将2019解释为列的名称。因此，数据框现在看起来像：

ValueError: cannot set using a multi-index selection indexer with a different length than the value

任何人都可以解释这种奇怪的行为，还是那是个错误？

Answer 1

使用:获取所有新列或更新列，如果没有:则将其固定，不幸的是仅适用于更新：

indexed.loc[(2, 2018), :]= [1, 4]
indexed.loc[(1, 2019), :]= [3, 4]
print (indexed)
          F1   F2
ID DT            
1  2018  0.0  0.0
2  2018  1.0  4.0
3  2017  0.0  1.0
4  2018  0.0  0.0
5  2019  1.0  0.0
1  2019  3.0  4.0

.loc在多级索引数据帧上的意外行为

1 个答案: