Pandas - 使用MultiIndex上的部分切片设置值

时间:2017-07-31 11:27:17

标签: python pandas indexing

我有一些代码,它产生以下空数据框:

>>> first = ['foo', 'bar']
>>> second = ['baz', 'can']
>>> third = ['ok', 'ko']
>>> colours = ['blue', 'yellow', 'green']

>>> idx = pd.IndexSlice
>>> ix = pd.MultiIndex.from_arrays(np.array([i for i in itertools.product(first, second, third)]).transpose().tolist(),
                                   names=('first', 'second', 'third'))
>>> df1 = pd.DataFrame(index=ix, columns=colours).sort_index()
>>> print(df1)

                   blue yellow green
first second third                  
bar   baz    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
      can    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
foo   baz    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
      can    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN

我打算这样做,是从另一个给定的DataFrame填充这个基于MultiIndex的空DataFrame,它是基于列的,如下所示(为了清楚起见,列被截断):

     baz_ok_blue  baz_ko_blue  can_ok_blue  can_ko_blue  baz_ok_yellow
foo    -1.385111    -1.014812    -1.419643     1.540341       0.663933
bar     0.445372    -0.226087     0.450982    -1.114169       0.896522

到目前为止,我一直在尝试这种方式:

idx = pd.IndexSlice
for s in second:
    for t in third:
        for c in colours:
            column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t)
            values = df2[column_name]
            df1.loc[idx[:, s, t], c] = values

在每次迭代中,values系列都是正确确定的,但是Pandas并没有将values的索引与df1的MultiIndex的第一级匹配。因此,所有df1值都保持NaN,因为Pandas正在尝试将MultiIndex与单个索引匹配。有没有办法解决这个问题?

基本上,为了提供更高层次的观点,我只是尝试将df2(基于字符串列)重新排列为df1(基于MultiIndex)的形式。

1 个答案:

答案 0 :(得分:2)

您可以先str.split创建MultiIndex,然后按stackreindex重新塑造:

df.columns = df.columns.str.split('_', expand=True)
print (df)
          baz                 can                 baz
           ok        ko        ok        ko        ok
         blue      blue      blue      blue    yellow
foo -1.385111 -1.014812 -1.419643  1.540341  0.663933
bar  0.445372 -0.226087  0.450982 -1.114169  0.896522

df = df.stack([0,1]).reindex(index=df1.index, columns=df1.columns)
print (df)
                        blue    yellow  green
first second third                           
bar   baz    ko    -0.226087       NaN    NaN
             ok     0.445372  0.896522    NaN
      can    ko    -1.114169       NaN    NaN
             ok     0.450982       NaN    NaN
foo   baz    ko    -1.014812       NaN    NaN
             ok    -1.385111  0.663933    NaN
      can    ko     1.540341       NaN    NaN
             ok    -1.419643       NaN    NaN