我在Macbook OSX 10.2.1(Sierra)上运行Python 3.5.2。
在尝试从Kaggle运行泰坦尼克数据集的某些代码时,我不断收到以下错误:
NotFittedError Traceback(最近一次调用 最后)in() 6 7#使用测试集进行预测并打印。 ----> 8 my_prediction = my_tree_one.predict(test_features) 9打印(my_prediction) 10
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py 在预测中(self,X,check_input) 429“”“ 430 - > 431 X = self._validate_X_predict(X,check_input) 432 proba = self.tree_.predict(X) 433 n_samples = X.shape [0]
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/tree/tree.py 在_validate_X_predict中(self,X,check_input) 386“”“每当试图预测,应用,predict_proba时验证X”“” 387如果self.tree_为None: - > 388引发NotFittedError(“Estimator not fitted,” 389“在使用模型之前调用
fit
。”) 390NotFittedError:未安装Estimator,在利用之前调用
fit
模型。
违规代码似乎是这样的:
# Impute the missing value with the median
test.Fare[152] = test.Fare.median()
# Extract the features from the test set: Pclass, Sex, Age, and Fare.
test_features = test[["Pclass", "Sex", "Age", "Fare"]].values
# Make your prediction using the test set and print them.
my_prediction = my_tree_one.predict(test_features)
print(my_prediction)
# Create a data frame with two columns: PassengerId & Survived. Survived contains your predictions
PassengerId =np.array(test["PassengerId"]).astype(int)
my_solution = pd.DataFrame(my_prediction, PassengerId, columns = ["Survived"])
print(my_solution)
# Check that your data frame has 418 entries
print(my_solution.shape)
# Write your solution to a csv file with the name my_solution.csv
my_solution.to_csv("my_solution_one.csv", index_label = ["PassengerId"])
以下是code其余部分的链接。
由于我已经调用了'fit'函数,我无法理解这个错误消息。我哪里错了?谢谢你的时间。
修改: 事实证明,问题是从前一段代码继承的。
# Fit your first decision tree: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)
# Look at the importance and score of the included features
print(my_tree_one.feature_importances_)
print(my_tree_one.score(features_one, target))
有了这条线: my_tree_one = my_tree_one.fit(features_one,target)
生成错误:
ValueError:输入包含NaN,无穷大或太大的值 D型( 'FLOAT32')。
答案 0 :(得分:0)
错误是不言自明的:features_one
或target
数组确实包含NaN
s或无限值,因此估算器无法拟合,因此您无法将其用于预测后面。
检查这些数组并在拟合之前相应地处理NaN
值。