在Pandas MultiIndex Dataframe中选择带有一个子索引的所有行

时间:2018-07-31 10:01:59

标签: python pandas dataframe multi-index

我在带有MultiIndex(行,属性)的Pandas数据框中有下表。我有类似的数据框,其中包含“类”和“概率”的值,但这些数据框具有单个索引(行)。

                    1   2   3   4   5   6   7   8   9   10  ...     69  70  71  72  73  74  75  76  77  78
row     attribute                                                                                   
0       class       -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -
        probability -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -
1       class       -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -
        probability -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -
2       class       -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -
        probability -   -   -   -   -   -   -   -   -   -   ...     -   -   -   -   -   -   -   -   -   -

现在如何将具有attribute ='class'属性的所有行的值设置为具有正确形状的另一个数据框中的值?同样,“概率”也是如此。我尝试了以下方法:

df.loc[df.attribute == "class"] = labels[sorted.values]

导致

AttributeError: 'DataFrame' object has no attribute 'attribute'

我对MultiIndex还是很陌生,因此希望获得任何提示,非常感谢!

1 个答案:

答案 0 :(得分:0)

我认为需要:

df.loc[df.index.get_level_values("attribute") == "class"] = labels[sorted.values]

示例

np.random.seed(789)
mux = pd.MultiIndex.from_product([np.arange(3), ['class','probability']],
                                  names=('row','attribute'))

df = pd.DataFrame(np.random.randint(10, size=(6, 10)), index=mux)
print (df)
                 0  1  2  3  4  5  6  7  8  9
row attribute                                
0   class        3  2  1  3  4  8  4  1  8  0
    probability  1  1  9  8  9  4  1  4  1  3
1   class        8  1  4  9  6  5  3  5  4  9
    probability  7  6  6  5  0  8  5  4  8  1
2   class        1  4  2  6  5  9  0  6  2  8
    probability  8  8  9  1  4  2  1  5  5  9

labels = pd.DataFrame(np.random.randint(2, size=(2, 10)), index=['class','probability'])
print (labels)
             0  1  2  3  4  5  6  7  8  9
class        0  0  1  1  0  0  0  0  0  1
probability  1  1  0  0  0  0  0  1  0  0

如果您想通过重复的行来替换值,请使用numpy.repeat

mask = df.index.get_level_values("attribute") == "class"
df.loc[mask] = np.repeat(labels.loc[['class']].values, mask.sum(), axis=0)
print (df)
                 0  1  2  3  4  5  6  7  8  9
row attribute                                
0   class        0  0  1  1  0  0  0  0  0  1
    probability  1  1  9  8  9  4  1  4  1  3
1   class        0  0  1  1  0  0  0  0  0  1
    probability  7  6  6  5  0  8  5  4  8  1
2   class        0  0  1  1  0  0  0  0  0  1
    probability  8  8  9  1  4  2  1  5  5  9

详细信息

print (np.repeat(labels.loc[['class']].values, mask.sum(), axis=0))
[[0 0 1 1 0 0 0 0 0 1]
 [0 0 1 1 0 0 0 0 0 1]
 [0 0 1 1 0 0 0 0 0 1]]