我试图为一组logloss值运行最小化函数,但是当使用scipy.minimize函数时,它似乎返回一个子最佳值。
数据来自熊猫表:
点击,prob1,prob2,prob3
0,0.0023,0.0024,0.012
1,0.89,0.672,0.78
0,0.43,0.023,0.032
from scipy.optimize import minimize
from math import log
import numpy as np
import pandas as pd
def logloss(p, y):
p = max(min(p, 1 - 10e-15), 10e-15)
return -log(p) if y == 1 else -log(1 - p)
def ensemble_weights(weights, probs, y_true):
loss = 0
final_pred = []
prob_length = len(probs)
for i in range(prob_length):
w_sum = 0
for index, weight in enumerate(weights):
w_sum += probs[i][index] * weight
final_pred.append(w_sum)
for index, pred in enumerate(final_pred):
loss += logloss(pred, y_true[index])
print loss / prob_length, 'weights :=', weights
return loss / prob_length
## w0 is the initial guess for the minimum of function 'fun'
## This initial guess is that all weights are equal
w0 = [1/probs.shape[1]] * probs.shape[1]
# ## This sets the bounds on the weights, between 0 and 1
bnds = [(0,1)] * probs.shape[1]
## This sets the constraints on the weights, they must sum to 1
## Or, in other words, 1 - sum(w) = 0
cons = ({'type':'eq','fun':lambda w: 1 - np.sum(w)})
weights = minimize(
ensemble_weights,
w0,
(probs,y_true),
method='SLSQP',
bounds=bnds,
constraints=cons
)
## As a sanity check, make sure the weights do in fact sum to 1
print("Weights sum to %0.4f:" % weights['fun'])
print weights['x']
为了帮助调试,我在函数中使用了一个print语句,返回以下内容。
0.0101326509533权重:= [1. 0. 0。]
0.0101326509533权重:= [1. 0. 0。]
0.0101326509702权重:= [1.00000001 0. 0.]
0.0101292476389权重:= [1.00000000e + 00 1.49011612e-08 0.00000000e + 00]
0.0101326509678权重:= [1.00000000e + 00 0.00000000e + 00 1.49011612e-08]
0.0102904525781权重:= [-4.44628778e-10 1.00000000e + 00 -4.38298620e-10]
0.00938612854966权重:= [5.00000345e-01 4.99999655e-01 -2.19149158e-10]
0.00961930211064权重:= [7.49998538e-01 2.50001462e-01 -1.09575296e-10]
0.00979499597866权重:= [8.74998145e-01 1.25001855e-01 -5.47881403e-11]
0.00990978430231权重:= [9.37498333e-01 6.25016666e-02 -2.73943942e-11]
0.00998305685424权重:= [9.68748679e-01 3.12513212e-02 -1.36974109e-11]
0.0100300175342权重:= [9.84374012e-01 1.56259881e-02 -6.84884901e-12]
0.0100605546439权重:= [9.92186781e-01 7.81321874e-03 -3.42452299e-12]
0.0100807513117权重:= [9.96093233e-01 3.90676721e-03 -1.71233067e-12]
0.0100942930446权重:= [9.98046503e-01 1.95349723e-03 -8.56215139e-13]
0.0101034594634权重:= [9.99023167e-01 9.76832595e-04 -4.28144378e-13]
0.0101034594634权重:= [9.99023167e-01 9.76832595e-04 -4.28144378e-13]
0.0101034594804权重:= [9.99023182e-01 9.76832595e-04 -4.28144378e-13]
0.0101034593149权重:= [9.99023167e-01 9.76847497e-04 -4.28144378e-13]
0.010103459478权重:= [9.99023167e-01 9.76832595e-04 1.49007330e-08]
权重总和为0.0101:
[9.99023167e-01 9.76832595e-04 -4.28144378e-13]
我的期望是返回的最佳权重应该是: 0.00938612854966重量:= [5.00000345e-01 4.99999655e-01 -2.19149158e-10]
有人能看到一个明显的问题吗?
FYI - >这段代码实际上是kaggle otto脚本的黑客 https://www.kaggle.com/hsperr/otto-group-product-classification-challenge/finding-ensamble-weights
答案 0 :(得分:0)
解决了
options = {'ftol':1e-9}
作为最小化功能的一部分