Question

我正在尝试运用我在安德鲁·伍（Andrew Ng）的Coursera课程中学到的知识。我已经在Kaggle Titanic数据集上以相同的方式成功实现了相同的算法，但是现在有了这些数据（UFC打架），我的成本却降低了。我将数据集简化为只有两个要素（对手和战斗结束），然后选择其zscore。

这是我的设计矩阵：（实际上更大，但是当它这么小时，我得到的负成本相同）

array([[ 1.        , -0.50373455, -0.35651205],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        ,  0.63737841, -1.55568894],
   [ 1.        ,  1.11284214,  0.84266484],
   [ 1.        , -1.07429103,  0.84266484],
   [ 1.        , -1.07429103, -1.55568894],
   [ 1.        ,  0.25700742,  0.84266484],
   [ 1.        , -1.83503301, -0.35651205],
   [ 1.        ,  1.20793489, -0.35651205],
   [ 1.        ,  1.58830588, -1.55568894],
   [ 1.        , -1.16938378,  0.84266484],
   [ 1.        , -0.78901279, -0.35651205],
   [ 1.        , -0.50373455, -1.55568894],
   [ 1.        ,  1.0177494 , -0.35651205],
   [ 1.        , -0.21845631,  0.84266484],
   [ 1.        ,  0.92265665, -1.55568894],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        ,  1.30302764, -0.35651205],
   [ 1.        ,  0.44719292, -0.35651205],
   [ 1.        , -0.69392004,  0.84266484],
   [ 1.        ,  1.39812038, -1.55568894],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.16191468,  0.84266484],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  0.63737841, -0.35651205],
   [ 1.        , -0.88410554,  0.84266484],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        , -1.73994026,  0.84266484],
   [ 1.        , -0.12336356,  0.84266484],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.8275639 , -1.55568894],
   [ 1.        ,  0.73247116,  0.84266484],
   [ 1.        ,  1.68339863, -1.55568894],
   [ 1.        ,  0.35210017, -1.55568894],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  1.30302764,  0.84266484]])

我的权重向量初始化为全零：

array([[0.],
   [0.],
   [0.]])

为完整起见，这里是Y向量：

array([[0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1]], dtype=uint8)

这是我的成本函数和S形/预测函数：

def cost_function(X, Y, theta):
    m = len(Y)
    h = predict(X,theta)
    cost = (np.dot((-Y.T), np.log(h)) - np.dot((1-Y).T, np.log(1-h))) / m
    return cost

def sigmoid(z):
    return 1/(1+np.e**(-z))

def predict(X, theta):
    z = np.dot(X, theta)
    return sigmoid(z)

这是梯度下降函数：

def gradient_descent(X, Y, theta, rate):
    m = len(Y)
    h = predict(X, theta)

    gradient = rate * np.dot(X.T, (h-Y)) / m
    theta -= gradient
    return theta

然后，我使用此train函数在n次迭代中都调用。

def train(X, Y, theta, rate, iters):
    cost_history = []

    for i in range(iters):
        theta = gradient_descent(X, Y, theta, rate)

        cost = cost_function(X, Y, theta)
        cost_history.append(cost)

        if i % 100 == 0:
            print("iter: " + str(i) + " cost: " + str(cost))
    return theta, cost_history

然后，在此结束时，我得到一个成本函数，如下所示：

这就是我在理解上遇到的困难。为什么它是负面的？代码或数据是否有问题，或者这是应该如何工作且我缺少某些东西？我一直在尝试最后一天以解决问题，但是还没到任何地方。仅使用这些功能，在使用上述功能进行训练后，仍可以使用权重正确地预测测试集中约54％的时间战斗结果，但是成本为负。

Answer 1

好的，再进行一些故障排除后，我发现了问题所在。我不确定为什么会导致此问题，但是修复它会使我的成本函数恢复正常。

因此Y向量的dtype是uint8，这显然会引起问题。将其更改为int64可修复所有问题。抱歉，我不知道为什么会导致此问题，但是如果我发现了问题，我将其编辑为答案。

为什么我在python中使用梯度下降获得了用于逻辑回归的负成本函数？

1 个答案: