Question

我第一次使用随机森林模型，但遇到了准确性量化问题。

目前，我拆分数据集（30% 作为测试大小），拟合模型，然后根据我的模型预测 y 值，并根据预测的测试值对模型进行评分。但我目前遇到了一个 100% 准确率的问题，我想知道这是因为我的模型设置的参数，还是因为我在此过程中犯了语法错误。

分割训练集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=1)

创建并拟合模型

# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000,
                           random_state = 42,
                           min_samples_split = 10,
                           max_features = "sqrt",
                           bootstrap = True)

# Train the model on training data
rf.fit(X_train, y_train)

预测测试集并计算准确率

y_pred = rf.predict(X_test)

print("Accuracy:", round((rf.score(X_test, y_pred)*100),2), "%")

>> 100.0%

我肯定是边走边学，但接受过一些正规培训。真的只是对建模方面感到兴奋，但想弄清楚我在继续学习这个过程时犯了什么错误。

Answer 1

你快到了！ ts 方法接受 score() 和 X_test，y_test 背后的逻辑：

score()

以上逻辑只是为了说明分数是如何运作的。

要在代码中获取分数：

# simplified logic behind score()

def score(X, y):
  y_predicted = model.predict(X)
  value = compute_metric(y, y_predicted)
  return value

您将获得 R^2 分数。 docs 你现在知道为什么会收到 rf.score(X_test, y_test) 了吗？

如果您想获得其他指标，则需要计算预测并使用回归指标 -> https://scikit-learn.org/stable/modules/classes.html#regression-metrics

您还可以使用 AutoML 进行学习（您自己不是模型）。您可以运行 AutoML 来创建基线模型。 AutoML 将为您计算许多指标。然后您可以编写自己的脚本并比较结果。

随机森林训练测试分割精度

1 个答案: