我目前正在使用“ Sigmoid”函数,“ cost”函数用来确定该线在散点图中的位置。但是,在输出时,使用正确的键将x和y数组输出到散点图上,但是图中没有该线。
确定theta的代码如下:
def loadTrgDf():
train, test = proc.getTrainingData()
# x derives from spouses and siblings
x = train.iloc[:, 2:4]
# y derives from the actual output
y = train.iloc[:, 1]
# Split and zeros
initial_theta = np.zeros(x.shape[1])
# Calculate the threta
theta = opt.fmin_cg(cost, initial_theta, cost_gradient, (x, y))
print(" ")
print(theta)
# Store for readability
sibSpTheta = theta[0]
parchTheta = theta[1]
然后将调查结果绘制到此处的散点图中:
# Plot findings
fig, ax = plt.subplots()
for index, row in train.iterrows():
if row['Survived'] == 1:
ax.scatter(row['SibSp'], row['Parch'], marker="+", c='green')
else:
ax.scatter(row['SibSp'], row['Parch'], marker="x", c='red', linewidth=1)
plt.title("Survival Rate", fontsize=16)
plt.xlabel("Spouses", fontsize=14)
plt.ylabel("Siblings", fontsize=14)
plt.legend(["survived", "not survived"])
plt.show()
x_axis = np.array([x.min(), x.max()])
y_axis = (-1 / 1) * (sibSpTheta * x_axis + parchTheta)
ax.plot(x_axis, y_axis, linewidth=2)
fig
opt.fmin_cg函数使用以下代码:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def cost(theta, x, y):
predictions = sigmoid(x @ theta)
predictions[predictions == 1] = 0.5 # log(1)=0 causes division error during optimization
error = -y * np.log(predictions) - (1 - y) * np.log(1 - predictions)
return sum(error) / len(y);
def cost_gradient(theta, x, y):
predictions = sigmoid(x @ theta);
return x.transpose() @ (predictions - y) / len(y)
值:
PassengerId Survived SibSp Parch
77 78 0 0 0
748 749 0 1 0
444 445 1 0 0
361 362 0 1 0
576 577 1 0 0
27 28 0 3 2
232 233 0 0 0
424 425 0 1 1
785 786 0 0 0
... ... ... ... ...
x包含IV的SibSp和Parch
y包含幸存的DV
这是意外的输出:
这是预期的输出:
编辑: 线出现了!但是,这是不准确的。
答案 0 :(得分:1)
问题不在于绘图,而在于回归概念。
y_axis = (-1 / 1) * (sibSpTheta * x_axis + parchTheta)
这是从看起来像这样的计算中得出的:
weights * features = weight0 + weight1 * feature1 + weight2 * feature2 + ...
您需要创建一个权重,该权重不对应任何要素值,因此该行将变为如下所示:
freeWeight = theta[0]
sibSpTheta = theta[1]
parchTheta = theta[2]
y_axis = (-1 / freeWeight) * (sibSpTheta * x_axis + parchTheta)
这可以通过创建一个额外的列来完成,该列不对应任何功能,但在拼接数据帧时具有一个伪值。此过程称为缩放。
移动到x
和+
标记上。您需要循环x
数据帧。不是完整的train
数据帧。