我正在研究一种将在最终用户计算机上实时运行的模型。因此,模型的预测速度至关重要。
我已经拥有的是RandomSearchCV
,可以针对F1分数进行优化。
缺少的是以某种方式将精度速度纳入决策的最佳模型。
model = SVC()
rand_list = {"C": stats.uniform(0.1, 10000),
"kernel": ["rbf", "poly"],
"gamma": stats.uniform(0.01, 100)}
rand_search = RandomizedSearchCV(model, param_distributions = rand_list,
n_iter = 20, n_jobs = 5, cv = 5,
scoring = "f1", refit=True)
rand_search.fit(X_tr_val, y_tr_val) #todo: adjust
print("Validation score of best model: ", rand_search.best_score_)
print("Best parameters: ", rand_search.best_params_)
我希望randomsearch要做的是针对每个参数组合运行预测以检查预测速度。然后根据f1和速度的组合给出分数。
伪代码:
def scoringFunc:
score = f1 + SpeedOfThePrediction
return score
rand_search = RandomizedSearchCV(model, param_distributions = rand_list,
n_iter = 200, n_jobs = 5, cv = 5,
scoring = scoringFunc, refit=True)
有人知道如何在RandomizedSearchCV
的评分中使用预测速度吗?
答案 0 :(得分:0)
实现这个想法变得困难有两个原因,
f1-分数将在[0-1]
的范围内,而您所谓的SpeedOfThePrediction
将在较大的范围内。因此,仅求和将失去f1-score的影响。
RandomSearchCV
中提供的计分方法只是将(y_true, y_pred)
作为计分函数的输入参数。因此,您无法在计分方法内计算计算时间/ speedofThePrediction
。
在Documentation中,示例自定义评分功能:
>>> from sklearn.model_selection import cross_validate
>>> from sklearn.metrics import confusion_matrix
>>> # A sample toy binary classification dataset
>>> X, y = datasets.make_classification(n_classes=2, random_state=0)
>>> svm = LinearSVC(random_state=0)
>>> def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0]
>>> def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1]
>>> def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0]
>>> def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 1]
>>> scoring = {'tp': make_scorer(tp), 'tn': make_scorer(tn),
... 'fp': make_scorer(fp), 'fn': make_scorer(fn)}
>>> cv_results = cross_validate(svm.fit(X, y), X, y,
... scoring=scoring, cv=5)
>>> # Getting the test set true positive scores
>>> print(cv_results['test_tp'])
[10 9 8 7 8]
>>> # Getting the test set false negative scores
>>> print(cv_results['test_fn'])
[0 1 2 3 2]
答案 1 :(得分:0)
我想出了一个解决方案:
def f1SpeedScore(clf, X_val, y_true):
time_bef_pred = time.time()
y_pred = clf.predict(X_val)
time_aft_pred = time.time()
pred_speed = time_aft_pred - time_bef_pred
n = len(y_true)
speed_one_sample = pred_speed / n
speed_penalty = (speed_one_sample * 1000) * 0.01 #0.01 score penality per millisecond
f1 = f1_score(y_true, y_pred)
score = f1 - speed_penalty
return score
rand_search = RandomizedSearchCV(model, param_distributions = rand_list,
n_iter = iterations, n_jobs = threads, cv = splits,
scoring = f1SpeedScore, refit=True, verbose = verbose)
它会使事情变慢一点,因为您必须运行额外的权限。但是,由于您只对计算近似速度感兴趣,因此可以对数据集的一小部分进行预测,以加快计算速度。