Python:使用Imputer on Dataframe索引处理NaN值

时间:2017-04-04 17:40:44

标签: python-3.x dataframe scikit-learn nan imputation

我有一些NaN值的数据,我想用imputer填充NaN值。

from sklearn.preprocessing import Imputer 
imp = Imputer(missing_values='NaN', strategy='mean', axis=1) 
cleaned_data = imp.fit_transform(original_data)

到目前为止,我知道imputer适用于整个列,如下所示:

            Point1        Point2
S.No
             2              NaN
1            NaN            4
             2              NaN
             NaN            4
2            2              NaN
             NaN            4

应用imputer后数据如下:

            Point1        Point2
S.No
             2              2
1            1              4
             2              2
             1              4
2            2              2
             1              4

但我希望imputer works索引名称为S.No

            Point1        Point2
S.No
             2              1.33
1            1.333          4
             2              1.33
             0.667          4
2            2              2.667
             0.667          4

可以像这样实现imputer或者python DataFrame上有<Border BorderBrush="#FF0B232F" BorderThickness="2" > <TextBlock Background="#FFCDCD5A" Grid.Column="4" Grid.Row="2" TextWrapping="Wrap" Width="214.8" Height="261.4" > </Border> 这样的替代方法。

1 个答案:

答案 0 :(得分:0)

imp = Imputer(missing_values=np.NaN,strategy='mean',axis=1)
for S.No in range (start,end):
    for col in list(Data.select_dtypes(include=['float']).columns):
        Data[col][S.No] = imp.fit_transform(Data[col][S.No])