我正在尝试使用sklearns SimpleImputer来SimpleImpute pandas dataframe列,如下所示:
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(completeDF_encoded)
FDS1 = imp_mean.transform(completeDF_encoded)
FDS1
但是transform会返回一个数组而不是一个数据帧,并替换了所有NaN,如下所示:
array([[1.0000e+00, 1.8800e+02, 0.0000e+00, ..., 0.0000e+00, 1.0000e+00,
0.0000e+00],
[2.0000e+00, 2.0900e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
1.0000e+00],
[3.0000e+00, 2.5700e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
1.0000e+00],
...,
[7.9998e+04, 2.5600e+02, 1.0000e+00, ..., 0.0000e+00, 1.0000e+00,
0.0000e+00],
[7.9999e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
0.0000e+00],
[8.0000e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
0.0000e+00]])
如何获取插补数据帧而不是numpy数组?
答案 0 :(得分:1)
我正在使用以下代码来对均值列进行估算:
for col in cols:
df[col].fillna(df[col].mean(), inplace = True)
cols是您希望插入的一系列列,例如:
cols = ['col1', 'col2', 'col3']