取消融化熊猫数据框以删除NaN

时间:2020-04-26 13:11:17

标签: python pandas dataframe data-manipulation

给出以下数据:

s = '{"PassengerId":{"0":1,"1":2,"2":3},"Survived":{"0":0,"1":1,"2":1},"Pclass":{"0":3,"1":1,"2":3}}'
df = pd.read_json(s)

外观如下:

   PassengerId  Survived  Pclass
0            1         0       3
1            2         1       1
2            3         1       3

假设它已经融化到

m = df.melt()
print(m)

      variable  value
0  PassengerId      1
1  PassengerId      2
2  PassengerId      3
3     Survived      0
4     Survived      1
5     Survived      1
6       Pclass      3
7       Pclass      1
8       Pclass      3

我想知道如何将融化的m还原为原始df

我尝试了类似以下操作:

m=df.melt().pivot(columns='variable', values='value').reset_index(drop=True)
m.columns.name = None

给出

   PassengerId  Pclass  Survived
0          1.0     NaN       NaN
1          2.0     NaN       NaN
2          3.0     NaN       NaN
3          NaN     NaN       0.0
4          NaN     NaN       1.0
5          NaN     NaN       1.0
6          NaN     3.0       NaN
7          NaN     1.0       NaN
8          NaN     3.0       NaN
​

可以看出,每一行仅包含有关单个列的信息,我想丢失很多NaN值。

1 个答案:

答案 0 :(得分:3)

GroupBy.cumcount用于DataFrame.pivot中用于index参数的新列:

m['new'] = m.groupby('variable').cumcount()

df = m.pivot(columns='variable', values='value', index='new')
print (df)

variable  PassengerId  Pclass  Survived
new                                    
0                   1       3         0
1                   2       1         1
2                   3       3         1

或者:

df = (m.assign(new = m.groupby('variable').cumcount())
       .pivot(columns='variable', values='value', index='new'))
print (df)

variable  PassengerId  Pclass  Survived
new                                    
0                   1       3         0
1                   2       1         1
2                   3       3         1