我有以下数据结构:
from collections import OrderedDict
import pandas as pd
d = OrderedDict([
((5, 3, 1), {'y1': 1}),
((5, 3, 2), {'y2': 2}),
((5, 4, 1), {'y1': 10}),
((5, 4, 2), {'y2': 20}),
((6, 3, 1), {'y1': 100}),
((6, 3, 2), {'y2': 200}),
((6, 4, 1), {'y1': 1000}),
((6, 4, 2), {'y2': 2000}),
])
df = pd.DataFrame(
d.values(),
index=pd.MultiIndex.from_tuples(d.keys(), names=['x3', 'x2', 'x1']),
)
该表格如
y1 y2
x3 x2 x1
5 3 1 1 NaN
2 NaN 2
4 1 10 NaN
2 NaN 20
6 3 1 100 NaN
2 NaN 200
4 1 1000 NaN
2 NaN 2000
正如您所看到的,x1和列之间存在一对一的映射(x1 = 1:y1,x1 = 2:y2),我想将其展平为
y1 y2
x3 x2
5 3 1 2
4 10 20
6 3 100 200
4 1000 2000
我该怎么做?
编辑:或者反过来说:
y
x3 x2 x1
5 3 1 1
2 2
4 1 10
2 20
6 3 1 100
2 200
4 1 1000
2 2000
答案 0 :(得分:2)
您可以使用stack
删除NaN
,因为请创建Series
,按reset_index
删除third
级别,然后按unstack
重新整理:
print (df.stack().reset_index(level=2,drop=True).unstack(2))
y1 y2
x3 x2
5 3 1.0 2.0
4 10.0 20.0
6 3 100.0 200.0
4 1000.0 2000.0
如果需要转发int
添加astype
:
print (df.stack().reset_index(level=2,drop=True).unstack(2).astype(int))
y1 y2
x3 x2
5 3 1 2
4 10 20
6 3 100 200
4 1000 2000
编辑:
print (df.stack().reset_index(level=3,drop=True).to_frame('y').astype(int))
y
x3 x2 x1
5 3 1 1
2 2
4 1 10
2 20
6 3 1 100
2 200
4 1 1000
2 2000
答案 1 :(得分:0)