给出以下数据:
s = '{"PassengerId":{"0":1,"1":2,"2":3},"Survived":{"0":0,"1":1,"2":1},"Pclass":{"0":3,"1":1,"2":3}}'
df = pd.read_json(s)
外观如下:
PassengerId Survived Pclass
0 1 0 3
1 2 1 1
2 3 1 3
假设它已经融化到
m = df.melt()
print(m)
variable value
0 PassengerId 1
1 PassengerId 2
2 PassengerId 3
3 Survived 0
4 Survived 1
5 Survived 1
6 Pclass 3
7 Pclass 1
8 Pclass 3
我想知道如何将融化的m
还原为原始df
。
我尝试了类似以下操作:
m=df.melt().pivot(columns='variable', values='value').reset_index(drop=True)
m.columns.name = None
给出
PassengerId Pclass Survived
0 1.0 NaN NaN
1 2.0 NaN NaN
2 3.0 NaN NaN
3 NaN NaN 0.0
4 NaN NaN 1.0
5 NaN NaN 1.0
6 NaN 3.0 NaN
7 NaN 1.0 NaN
8 NaN 3.0 NaN
可以看出,每一行仅包含有关单个列的信息,我想丢失很多NaN值。
答案 0 :(得分:3)
将GroupBy.cumcount
用于DataFrame.pivot
中用于index
参数的新列:
m['new'] = m.groupby('variable').cumcount()
df = m.pivot(columns='variable', values='value', index='new')
print (df)
variable PassengerId Pclass Survived
new
0 1 3 0
1 2 1 1
2 3 3 1
或者:
df = (m.assign(new = m.groupby('variable').cumcount())
.pivot(columns='variable', values='value', index='new'))
print (df)
variable PassengerId Pclass Survived
new
0 1 3 0
1 2 1 1
2 3 3 1