在多索引pandas数据帧中展平一对一映射

时间:2016-11-24 16:48:43

标签: python pandas

我有以下数据结构:

from collections import OrderedDict
import pandas as pd

d = OrderedDict([
    ((5, 3, 1), {'y1': 1}),
    ((5, 3, 2), {'y2': 2}),
    ((5, 4, 1), {'y1': 10}),
    ((5, 4, 2), {'y2': 20}),

    ((6, 3, 1), {'y1': 100}),
    ((6, 3, 2), {'y2': 200}),
    ((6, 4, 1), {'y1': 1000}),
    ((6, 4, 2), {'y2': 2000}),
])

df = pd.DataFrame(
    d.values(),
    index=pd.MultiIndex.from_tuples(d.keys(), names=['x3', 'x2', 'x1']),
)

该表格如

            y1    y2
x3 x2 x1            
5  3  1      1   NaN
      2    NaN     2
   4  1     10   NaN
      2    NaN    20
6  3  1    100   NaN
      2    NaN   200
   4  1   1000   NaN
      2    NaN  2000

正如您所看到的,x1和列之间存在一对一的映射(x1 = 1:y1,x1 = 2:y2),我想将其展平为

         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

我该怎么做?

编辑:或者反过来说:

             y
x3 x2 x1            
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

2 个答案:

答案 0 :(得分:2)

您可以使用stack删除NaN,因为请创建Series,按reset_index删除third级别,然后按unstack重新整理:

print (df.stack().reset_index(level=2,drop=True).unstack(2))
           y1      y2
x3 x2                
5  3      1.0     2.0
   4     10.0    20.0
6  3    100.0   200.0
   4   1000.0  2000.0

如果需要转发int添加astype

print (df.stack().reset_index(level=2,drop=True).unstack(2).astype(int))
         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

编辑:

print (df.stack().reset_index(level=3,drop=True).to_frame('y').astype(int))
             y
x3 x2 x1      
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

答案 1 :(得分:0)

df2 = df.unstack()
df2.columns = range(4)
df3 = df2.drop([1,2], axis=1)
df3.columns = ["Y1", "Y2"]
df3

给出

enter image description here