Question

我目前正在使用“ Sigmoid”函数，“ cost”函数用来确定该线在散点图中的位置。但是，在输出时，使用正确的键将x和y数组输出到散点图上，但是图中没有该线。

确定theta的代码如下：

def loadTrgDf():
    train, test = proc.getTrainingData()

    # x derives from spouses and siblings
    x = train.iloc[:, 2:4]

    # y derives from the actual output
    y = train.iloc[:, 1]

    # Split and zeros
    initial_theta = np.zeros(x.shape[1])

    # Calculate the threta
    theta = opt.fmin_cg(cost, initial_theta, cost_gradient, (x, y))

    print(" ")
    print(theta)

    # Store for readability
    sibSpTheta = theta[0]
    parchTheta = theta[1]

然后将调查结果绘制到此处的散点图中：

    # Plot findings
    fig, ax = plt.subplots()

    for index, row in train.iterrows():
        if row['Survived'] == 1:
            ax.scatter(row['SibSp'], row['Parch'], marker="+", c='green')
        else:
            ax.scatter(row['SibSp'], row['Parch'], marker="x", c='red', linewidth=1)

    plt.title("Survival Rate", fontsize=16)
    plt.xlabel("Spouses", fontsize=14)
    plt.ylabel("Siblings", fontsize=14)

    plt.legend(["survived", "not survived"])

    plt.show()

    x_axis = np.array([x.min(), x.max()])
    y_axis = (-1 / 1) * (sibSpTheta * x_axis + parchTheta)
    ax.plot(x_axis, y_axis, linewidth=2)
    fig

opt.fmin_cg函数使用以下代码：

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def cost(theta, x, y):
    predictions = sigmoid(x @ theta)
    predictions[predictions == 1] = 0.5 # log(1)=0 causes division error during optimization
    error = -y * np.log(predictions) - (1 - y) * np.log(1 - predictions)
    return sum(error) / len(y);

def cost_gradient(theta, x, y):
    predictions = sigmoid(x @ theta);
    return x.transpose() @ (predictions - y) / len(y)

值：

   PassengerId  Survived  SibSp  Parch
77            78         0      0      0
748          749         0      1      0
444          445         1      0      0
361          362         0      1      0
576          577         1      0      0
27            28         0      3      2
232          233         0      0      0
424          425         0      1      1
785          786         0      0      0
...          ...        ...    ...    ...

x包含IV的SibSp和Parch

y包含幸存的DV

这是意外的输出：

这是预期的输出：

编辑： 线出现了！但是，这是不准确的。

Answer 1

问题不在于绘图，而在于回归概念。

y_axis = (-1 / 1) * (sibSpTheta * x_axis + parchTheta)

这是从看起来像这样的计算中得出的：

weights * features = weight0 + weight1 * feature1 + weight2 * feature2 + ...

您需要创建一个权重，该权重不对应任何要素值，因此该行将变为如下所示：

freeWeight = theta[0]
sibSpTheta = theta[1]
parchTheta = theta[2]

y_axis = (-1 / freeWeight) * (sibSpTheta * x_axis + parchTheta)

这可以通过创建一个额外的列来完成，该列不对应任何功能，但在拼接数据帧时具有一个伪值。此过程称为缩放。

移动到x和+标记上。您需要循环x数据帧。不是完整的train数据帧。

为什么Sigmoid函数未在散点图上显示该线？

1 个答案: