Scikit-learn:替换丢失数据时出错

时间:2016-11-19 23:01:38

标签: python python-2.7 scikit-learn

我试图通过用平均值替换缺失值来预处理我的数据。

我的代码如下:

#Load the Data 
import numpy as np
data_2 = np.genfromtxt('data.csv', delimiter=',', skip_header=1)

#the missing values in my dataset are identified by value = 0 
#I'm trying to replace the missing values in the third column 
from sklearn.preprocessing import Imputer 
imp = Imputer(missing_values=0, strategy='mean', axis=0)
imp.fit(data_2[:, 2])

它运行但是发出了这些警告:

/Users/user1/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

/Users/user1/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

但我的主要问题是它没有填充缺失的数据,我在拟合之前和之后打印了数据并且没有变化。

我做错了什么?

更新: 这是我的数据集的几行:
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0

1 个答案:

答案 0 :(得分:1)

  • 您分享的前几行不包含任何空值,因此很难解释
  • 考虑这个稍微更新的数据集版本,让您了解。

    6,148,72,35,0,33.6,0.627,50,1
    1,85,,29,0,26.6,0.351,,
    ,183,64,,0,,0.672,32,1
    1,89,66,23,94,28.1,0.167,21,0
    
  • 使用库pandas

    可以轻松填充缺失值
    #Load Libraries and data
    import pandas as pd
    df = pd.read_csv('data.csv',names=[1,2,3,4,5,6,7,8,9])
    
    #Fill the Null values with the mean
    df = df.fillna(df.mean())
    
  • 名称 read_csv 函数中的参数用于为csv文件的列命名

  • fillna()函数将填充缺失值。