Question

我有以下数据结构：

from collections import OrderedDict
import pandas as pd

d = OrderedDict([
    ((5, 3, 1), {'y1': 1}),
    ((5, 3, 2), {'y2': 2}),
    ((5, 4, 1), {'y1': 10}),
    ((5, 4, 2), {'y2': 20}),

    ((6, 3, 1), {'y1': 100}),
    ((6, 3, 2), {'y2': 200}),
    ((6, 4, 1), {'y1': 1000}),
    ((6, 4, 2), {'y2': 2000}),
])

df = pd.DataFrame(
    d.values(),
    index=pd.MultiIndex.from_tuples(d.keys(), names=['x3', 'x2', 'x1']),
)

该表格如

            y1    y2
x3 x2 x1            
5  3  1      1   NaN
      2    NaN     2
   4  1     10   NaN
      2    NaN    20
6  3  1    100   NaN
      2    NaN   200
   4  1   1000   NaN
      2    NaN  2000

正如您所看到的，x1和列之间存在一对一的映射（x1 = 1：y1，x1 = 2：y2），我想将其展平为

         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

我该怎么做？

编辑：或者反过来说：

             y
x3 x2 x1            
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

Answer 1

您可以使用stack删除NaN，因为请创建Series，按reset_index删除third级别，然后按unstack重新整理：

print (df.stack().reset_index(level=2,drop=True).unstack(2))
           y1      y2
x3 x2                
5  3      1.0     2.0
   4     10.0    20.0
6  3    100.0   200.0
   4   1000.0  2000.0

如果需要转发int添加astype：

print (df.stack().reset_index(level=2,drop=True).unstack(2).astype(int))
         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

编辑：

print (df.stack().reset_index(level=3,drop=True).to_frame('y').astype(int))
             y
x3 x2 x1      
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

Answer 2

df2 = df.unstack()
df2.columns = range(4)
df3 = df2.drop([1,2], axis=1)
df3.columns = ["Y1", "Y2"]
df3

给出

在多索引pandas数据帧中展平一对一映射

2 个答案: