Imputer的拟合方法抛出缺少1个必需的位置参数:“ X”错误

时间:2018-06-23 10:08:27

标签: machine-learning scikit-learn python-3.6 spyder

我正在尝试在“数据预处理”阶段解决丢失的数据问题,并且一直在认真地研究过有关udemy的教程。

这是我的数据集“ Data.csv”

Country Age Salary  Purchased
France  44  72000   No
Spain   27  48000   Yes
Germany 30  54000   No
Spain   38  61000   No
Germany 40          Yes
France  35  58000   Yes
Spain       52000   No
France  48  79000   Yes
Germany 50  83000   No
France  37  67000   Yes 

这是完整的代码。

    # Data Preprocessing

    #Importing Libraries

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd

    dataset = pd.read_csv('Data.csv')
    X = dataset.iloc[:, :-1].values
    Y = dataset.iloc[:, -1].values

    # Taking care of missing data

    from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)

    #This line below throws the error
    imputer = Imputer.fit(X[:, 1:3])
    X[:, 1:3] = imputer.transform(X[:, 1:3])

上面的代码在教程视频中运行得很好,但是当我运行上面的代码时,出现以下错误:

**imputer = Imputer.fit(X[:, 1:3])
Traceback (most recent call last):

  File "<ipython-input-3-dddb27392326>", line 1, in <module>
    imputer = Imputer.fit(X[:, 1:3])

TypeError: fit() missing 1 required positional argument: 'X'**

我正在使用以下规格:

OS:Win 8.1教程具有MAC IDE:Spyder 3.2.8 Python 3.6

有人可以帮我调试此错误吗?

1 个答案:

答案 0 :(得分:0)

我正在使用sklearn版本0.19.1。因此,您在调用类方法Imputer.fit而不是实例方法imputer.fit的代码中存在错误,因为imputer是代码中Imputer的实例。您也可以使用Imputer的{​​{3}}方法像这样一口气拟合和变换数据

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import Imputer
import pandas as pd

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, -1].values

# Taking care of missing data
imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)

X[:, 1:3] = imputer.fit_transform(X[:, 1:3])

这会将数组X更改为

array([['France', 44.0, 72000.0],
   ['Spain', 27.0, 48000.0],
   ['Germany', 30.0, 54000.0],
   ['Spain', 38.0, 61000.0],
   ['Germany', 40.0, 63777.77777777778],
   ['France', 35.0, 58000.0],
   ['Spain', 38.77777777777778, 52000.0],
   ['France', 48.0, 79000.0],
   ['Germany', 50.0, 83000.0],
   ['France', 37.0, 67000.0]], dtype=object)

作为旁注,请避免使用与类本身相同的名称来命名类实例。我没有更改答案中的名称以指出代码中的错误。