我正在使用一个监控类来实现here
class Monitor():
"""Monitor for early stopping in Gradient Boosting for classification.
The monitor checks the validation loss between each training stage. When
too many successive stages have increased the loss, the monitor will return
true, stopping the training early.
Parameters
----------
X_valid : array-like, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples
and n_features is the number of features.
y_valid : array-like, shape = [n_samples]
Target values (integers in classification, real numbers in
regression)
For classification, labels must correspond to classes.
max_consecutive_decreases : int, optional (default=5)
Early stopping criteria: when the number of consecutive iterations that
result in a worse performance on the validation set exceeds this value,
the training stops.
"""
def __init__(self, X_valid, y_valid, max_consecutive_decreases=5):
self.X_valid = X_valid
self.y_valid = y_valid
self.max_consecutive_decreases = max_consecutive_decreases
self.losses = []
def __call__(self, i, clf, args):
if i == 0:
self.consecutive_decreases_ = 0
self.predictions = clf._init_decision_function(self.X_valid)
predict_stage(clf.estimators_, i, self.X_valid, clf.learning_rate,
self.predictions)
self.losses.append(clf.loss_(self.y_valid, self.predictions))
if len(self.losses) >= 2 and self.losses[-1] > self.losses[-2]:
self.consecutive_decreases_ += 1
else:
self.consecutive_decreases_ = 0
if self.consecutive_decreases_ >= self.max_consecutive_decreases:
print("f"
"({}): s {}.".format(self.consecutive_decreases_, i)),
return True
else:
return False
params = { 'n_estimators': nEstimators,
'max_depth': maxDepth,
'min_samples_split': minSamplesSplit,
'min_samples_leaf': minSamplesLeaf,
'min_weight_fraction_leaf': minWeightFractionLeaf,
'min_impurity_decrease': minImpurityDecrease,
'learning_rate': 0.01,
'loss': 'quantile',
'alpha': alpha,
'verbose': 0
}
model = ensemble.GradientBoostingRegressor( **params )
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
效果很好。但是,我不清楚这条线是什么型号
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
返回:
1)没有模特
2)在停止之前训练模型
3)之前的模型25次迭代(注意监视器的参数)
如果不是(3),是否可以使估算器返回3?
我该怎么做?
答案 0 :(得分:1)
模型在“停止规则”停止模型之前返回拟合 - 意味着你的答案No.2是正确的。
这个'监视器代码'的问题在于最终选择的模型将是包含25次额外迭代的模型。选择的模型应该是你的NO.3答案。
我认为这样做的简单(和愚蠢)方法是运行相同的模型(使用种子 - 具有相同的结果)但保持模型没有迭代等于(i - max_consecutive_decreases)