我正在使用sklearns的LogisticRegression
和SGDclassifier
训练数据集,并使用log作为损失函数。
而且我使用Logloss
作为评估指标。
但是使用SGDclassifier
,它的对数损失很高(0.66),然后是LogisticRegression
(0.48)。
我尝试用alpha
调优n_iter
,learning_rate
,max_iter
和SGDclassifier
的性能,但是没有运气。
逻辑回归分类器代码。
alpha = [10 ** x for x in range(-5, 3)] # hyperparam for SGD classifier.
log_error_array=[]
for i in alpha:
clf = LogisticRegression(C=i, penalty='l1', random_state=42,class_weight = 'balanced')
clf.fit(tr,approval_tr)
sig_clf = CalibratedClassifierCV(clf, method="sigmoid")
sig_clf.fit(tr, approval_tr)
predict_y = sig_clf.predict_proba(cv)
log_error_array.append(log_loss(approval_cv, predict_y, eps=1e-15))
print('For values of alpha = ', i, "The log loss is:",log_loss(approval_cv, predict_y, eps=1e-15))
输出
For values of alpha = 1e-05 The log loss is: 0.6649895381677852
For values of alpha = 0.0001 The log loss is: 0.6649874120729949
For values of alpha = 0.001 The log loss is: 0.6649874120658615
For values of alpha = 0.01 The log loss is: 0.546799752877368
For values of alpha = 0.1 The log loss is: 0.49969119164808273
For values of alpha = 1 The log loss is: 0.4768379193463679
For values of alpha = 10 The log loss is: 0.4838656842062527
For values of alpha = 100 The log loss is: 0.4969062791884036
对于SGD分类器
params = {'alpha' : [10 ** x for x in range(-4, 3)],
'learning_rate' : ['constant','optimal','invscaling','adaptive'],
}
clf = SGDClassifier(loss = 'log',class_weight = 'balanced',random_state = 42,eta0 = 10,penalty = 'l2',n_iter = 1000,max_iter = 2000)
tuned_clf = GridSearchCV(clf, param_grid = params,scoring = 'neg_log_loss',verbose = 1,n_jobs = -1)
tuned_clf.fit(tr,approval_tr)
tuned_clf.best_score_
输出
-0.6887660472967351
任何建议,我在这里想念什么。