Question

我正在尝试比较 random forest 和 XGBoost 之间的准确度结果（在泰坦尼克号上），但我不明白为什么 random forest 给出了更好的结果。

XGBoost 是一种优化的树基模型。它在每个周期（每个新的估计器）计算优化的树。
随机森林构建许多树（具有不同的数据和不同的特征）并选择最好的树。

我正在处理 Titanic 日期集（在我处理 Nan 并消除一些噪音之后）。（两个模型获得相同的日期集）

对于这两种算法，我都使用超参数进行了调整。

XGBoost 模型：

model = XGBClassifier(n_jobs=-1, random_state=42)
hyperparams = {'max_depth': [2,3,4,5,6,7,8],
               'n_estimators': [20, 50, 100, 120],
               'learning_rate': [0.1, 0.2, 0.3, 0.4, 0.5]}

randomized = RandomizedSearchCV(model, hyperparams, n_iter=40, cv=5, random_state=42, scoring='accuracy')
randomized.fit(x,y)
best_params = randomized.best_estimator_
model = XGBClassifier(n_jobs=-1,
                                   max_depth=best_params.max_depth,
                                   n_estimators=best_params.n_estimators,
                                   learning_rate=best_params.learning_rate)
model.fit(train_df_x, train_df_y)
y_pred = model.predict(test_df_x)

随机森林模型：

model = RandomForestClassifier(random_state=42, n_jobs=-1)
hyperparams = {'n_estimators': [20, 50, 100, 120],
                'max_depth': [2,3,4,5,6,7,8]}

randomized = RandomizedSearchCV(model, hyperparams, n_iter=20, cv=5, random_state=42, scoring='accuracy')
randomized.fit(x, y)
best_params = randomized.best_estimator_
model = RandomForestClassifier(random_state=42,
                                            n_jobs=-1,
                                            n_estimators=best_params.n_estimators,
                                            max_depth=best_params.max_depth)
model.fit(train_df_x, train_df_y)
y_pred = model.predict(test_df_x)

如您所见：

我在 XGBoost 上使用了更多的超参数迭代（因为它有更多的参数需要调整）。

我得到以下准确度结果：

随机森林：86.6
XGBoost：85.41

在运行测试之前，我确信 XGBoost 会给我带来更好的结果。

random forest 怎么会给出更好的结果？使用 XGBoost 时我缺少什么？

为什么随机森林的结果比 XGBoost 好？

0 个答案: