我在Support Vector Regression中使用GridSearchCV作为估算工具。但我想更改错误函数:而不是使用默认值(R平方:确定系数),我想定义自己的自定义错误函数。
我尝试用make_scorer
创建一个,但它没有用。
我阅读了文档并发现可以创建custom estimators,但我不需要重新制作整个估算工具 - 只需要错误/评分函数。
我认为我可以通过将callable定义为得分手来实现,就像它在docs中所说的那样。
但我不知道如何使用估算器:在我的情况下SVR。我是否必须切换到分类器(例如SVC)?我将如何使用它?
我的自定义错误功能如下:
def my_custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
变量M
不为空/零。为简单起见,我把它设置为零。
是否有人能够显示此自定义评分功能的示例应用程序?谢谢你的帮助!
答案 0 :(得分:21)
杰米有一个充实的例子,但这里有一个例子直接从scikit-learn documentation使用make_scorer:
import numpy as np
def my_custom_loss_func(ground_truth, predictions):
diff = np.abs(ground_truth - predictions).max()
return np.log(1 + diff)
# loss_func will negate the return value of my_custom_loss_func,
# which will be np.log(2), 0.693, given the values for ground_truth
# and predictions defined below.
loss = make_scorer(my_custom_loss_func, greater_is_better=False)
score = make_scorer(my_custom_loss_func, greater_is_better=True)
ground_truth = [[1, 1]]
predictions = [0, 1]
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf = clf.fit(ground_truth, predictions)
loss(clf,ground_truth, predictions)
score(clf,ground_truth, predictions)
通过sklearn.metrics.make_scorer
定义自定义记分员时,惯例是以_score
结尾的自定义函数返回一个值以使其最大化。对于以_loss
或_error
结尾的得分者,返回的值最小化。您可以通过在make_scorer
中设置greater_is_better
参数来使用此功能。也就是说,对于较高值较高的得分者,此参数为True
,对于较低值较好的得分者,此参数为False
。然后GridSearchCV
可以在适当的方向上进行优化。
然后您可以将您的功能转换为得分手,如下所示:
from sklearn.metrics.scorer import make_scorer
def custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
custom_scorer = make_scorer(custom_loss_func, greater_is_better=True)
然后将custom_scorer
传递给GridSearchCV
,就像其他任何评分函数一样:clf = GridSearchCV(scoring=custom_scorer)
。
答案 1 :(得分:19)
如您所见,这是通过make_scorer
(docs)完成的。
from sklearn.grid_search import GridSearchCV
from sklearn.metrics.scorer import make_scorer
from sklearn.svm import SVR
import numpy as np
rng = np.random.RandomState(1)
def my_custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()
# Add noise to targets
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))
train_size = 100
my_scorer = make_scorer(my_custom_loss_func, greater_is_better=True)
svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1),
scoring=my_scorer,
cv=5,
param_grid={"C": [1e0, 1e1, 1e2, 1e3],
"gamma": np.logspace(-2, 2, 5)})
svr.fit(X[:train_size], y[:train_size])
print svr.best_params_
print svr.score(X[train_size:], y[train_size:])