正则Logistic回归成本函数Log(1-p)= inf

时间:2018-10-14 14:25:24

标签: python logistic-regression

我正在尝试实现逻辑回归成本函数。
我测试了我的实现,它可以在不同的数据集中正常工作。
但是,当我尝试用新数据集求解时,我意识到下面等式(term2)的第二部分始终是inf。问题是传递给np.log()函数的值是0,所以它给了我inf。其实是sigmoid(hypothesis(x,theta)) = 1的答案。

term1 = -y*(np.log(sigmoid(hypothesis(x,theta))))
term2 = ((1-y)*(np.log(1 - sigmoid(hypothesis(x,theta)))))
infunc1 = term1 - term2 
infunc2 = (lambda_*np.sum(theta[1:]**2))/(2*m)
j = (np.sum(infunc1)/m)+infunc2

我认为第一个解决方案是将一个很小的值添加到0以防止inf出现。但我不知道这是正确与否。 (基于此question)。

将某些特征与权重相乘的答案为零并将该答案传递给log时,我应该怎么办?

谢谢您的任何建议。 编码愉快

1 个答案:

答案 0 :(得分:0)

正如朋友所说,您只需要向[{ "diff": { "main": { "values": [[0],[11.66],[9.82],[10.707500000000001], [0.8743962202571556]], "labels": ["gen", "max", "min", "avg", "std"] }, "fp": { "values": [[], [], [], [], []], "labels": ["gen", "max", "min", "avg", "std"] } } },{ "diff": { "main": { "values": [[], [], [], [], []], "labels": ["gen", "max", "min", "avg", "std"] }, "fp": { "values": [[], [1.539999999999999], [0.570000000000000], [0.832500000000000], [0.40880160224734857]], "labels": ["gen", "max", "min", "avg", "std"] } } }, { "diff": { "main": { "values": [[1], [15.32], [8.92], [10.8175], [2.61861007979424]], "labels": ["gen", "max", "min", "avg", "std"] }, "fp": { "values": [[], [], [], [], [] ], "labels": ["gen", "max", "min", "avg", "std"] } } }, { "diff": { "main": { "values": [[2], [15.32], [9.23], [11.037500000000001], [2.4834288292600615]], "labels": ["gen", "max", "min", "avg", "std"] }, "fp": { "values": [[1, 2], [1.5399999999999991, 0.6100000000000012], [0.5700000000000003, 0.5700000000000003], [0.8325000000000005, 0.600000000000001], [0.40880160224734857, 0.017320508075689172]], "labels": ["gen", "max", "min", "avg", "std"] } } }, { "diff": { "main": { "values": [[3], [16.75], [9.23], [14.155000000000001], [2.9027616161166248]], "labels": ["gen", "max", "min", "avg", "std" ] }, "fp": { "values": [[], [], [], [], []], "labels": ["gen", "max", "min", "avg", "std" ] } } }, { "diff": { "main": { "values": [[], [], [], [], [],[]], "labels": ["gen", "max", "min", "avg", "std"] }, "fp": { "values": [[3], [0.6100000000000012], [0.5700000000000003], [0.5900000000000007], [0.020000000000000462] ], "labels": ["gen", "max", "min", "avg", "std"] } } } ] 函数添加一个很小的值即可:

log