Question

我正在使用多索引熊猫数据框。我的目标是将两个索引列合并为一个索引列，而不会出现一系列方法。

示例：

A具有以下熊猫数据框：

In[1]: df
Out[1]: 
                    value
year type color          
2018 A    red   -0.236022
          blue  -1.030577
     B    red    1.197374
          blue  -0.496247
2019 A    red   -0.066938
          blue   0.087585
     B    red   -1.702598
          blue   0.085282

现在，我想在此数据帧上执行一系列方法。在这些方法的中间某个地方，我想将两个索引列合并为一个。例如，我执行查询（类型== A），然后组合两个索引列（年份和颜色），然后相乘（乘以4）。所有这一切都没有从链中脱颖而出：

df2 = df \
  .query('type=="A"') \
  .reset_index('type', drop=True) \
  .combine_indexes(["year", "type"]) \ # <- this is what I'm missing
  .multiply(4)

所需的输出是：

In[3]: df2
Out[3]: 

               value
year-color          
2018-red   -0.944089
2018-blue  -4.122310
2019-red   -0.267752
2019-blue   0.350339

在此示例中，我组成了“ combine_indexes”方法。有人知道这是否等同吗？我知道如何合并两个索引列，但前提是我要从链中突围出来。我需要与链接兼容的东西。

谢谢

Answer 1

在不中断链条的情况下，我将最后移set_index：

(df.query('type=="A"')
   .reset_index('type',drop=True)
   .mul(4)
   .assign(year_color=lambda x: [f'{a}-{b}' for a,b in x.index])
   .set_index('year_color')
)

输出（原始值为np.arange(8)）

            value
year_color       
2018-red        0
2018-blue       4
2019-red       16
2019-blue      20

Answer 2

关于如何加入索引级别的问题，让我们尝试Index.map：

tmp = df.query('type == "A"').droplevel('type')
# The money line:
tmp.index = tmp.index.map('{0[0]}-{0[1]}'.format)                                                                                              
tmp.index.name = 'year-color'                                                                                                                  

tmp                                                                                                                                            

               value
year-color          
2018-red   -0.236022
2018-blue  -1.030577
2019-red   -0.066938
2019-blue   0.087585

如何在方法链中组合熊猫索引列？

2 个答案: