我有一些数据集,其中包含一些我用于对数据库进行分组的列。我在同一数据集中有一些其他数值列,但有一些缺失值。我想用缺少的条目所在的组的平均值来填充列的缺失值。
Name of Pandas dataset=data
Col on which groups would be based=['A','B']
Col that needs to be imputed with group based means: ['C']
答案 0 :(得分:2)
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,1,3],
[1,1,9],
[1,1,np.nan],
[2,2,8],
[2,1,4],
[2,2,np.nan],
[2,2,5]]
, columns=list('ABC'))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 NaN
3 2 2 8.0
4 2 1 4.0
5 2 2 NaN
6 2 2 5.0
df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 6.0
3 2 2 8.0
4 2 1 4.0
5 2 2 6.5
6 2 2 5.0
答案 1 :(得分:0)
[df[i].fillna(df[i].mean(),inplace=True) for i in df.columns ]
然后从列C填充NAN,其中5.8是列'C'
的平均值Output
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 5.8
3 2 2 8.0
4 2 1 4.0
5 2 2 5.8
6 2 2 5.0