NotFittedError:Estimator未安装,在利用模型

时间:2016-12-02 17:10:22

标签: python machine-learning scikit-learn

我在Macbook OSX 10.2.1(Sierra)上运行Python 3.5.2。

在尝试从Kaggle运行泰坦尼克数据集的某些代码时,我不断收到以下错误:

        

NotFittedError Traceback(最近一次调用   最后)in()         6         7#使用测试集进行预测并打印。   ----> 8 my_prediction = my_tree_one.predict(test_features)         9打印(my_prediction)        10

     

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py   在预测中(self,X,check_input)       429“”“       430    - > 431 X = self._validate_X_predict(X,check_input)       432 proba = self.tree_.predict(X)       433 n_samples = X.shape [0]

     

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py   在_validate_X_predict中(self,X,check_input)       386“”“每当试图预测,应用,predict_proba时验证X”“”       387如果self.tree_为None:    - > 388引发NotFittedError(“Estimator not fitted,”       389“在使用模型之前调用fit。”)       390

     

NotFittedError:未安装Estimator,在利用之前调用fit   模型。

违规代码似乎是这样的:

# Impute the missing value with the median
test.Fare[152] = test.Fare.median()

# Extract the features from the test set: Pclass, Sex, Age, and Fare.
test_features = test[["Pclass", "Sex", "Age", "Fare"]].values

# Make your prediction using the test set and print them.
my_prediction = my_tree_one.predict(test_features)
print(my_prediction)

# Create a data frame with two columns: PassengerId & Survived. Survived contains your predictions
PassengerId =np.array(test["PassengerId"]).astype(int)
my_solution = pd.DataFrame(my_prediction, PassengerId, columns = ["Survived"])
print(my_solution)

# Check that your data frame has 418 entries
print(my_solution.shape)

# Write your solution to a csv file with the name my_solution.csv
my_solution.to_csv("my_solution_one.csv", index_label = ["PassengerId"])

以下是code其余部分的链接。

由于我已经调用了'fit'函数,我无法理解这个错误消息。我哪里错了?谢谢你的时间。

修改: 事实证明,问题是从前一段代码继承的。

# Fit your first decision tree: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)

# Look at the importance and score of the included features
print(my_tree_one.feature_importances_)
print(my_tree_one.score(features_one, target))

有了这条线: my_tree_one = my_tree_one.fit(features_one,target)

生成错误:

  

ValueError:输入包含NaN,无穷大或太大的值   D型( 'FLOAT32')。

1 个答案:

答案 0 :(得分:0)

错误是不言自明的:features_onetarget数组确实包含NaN s或无限值,因此估算器无法拟合,因此您无法将其用于预测后面。

检查这些数组并在拟合之前相应地处理NaN值。