使用Matplotlib绘制决策边界时出错

时间:2018-11-19 16:55:18

标签: python matplotlib scikit-learn

我最近使用Scikit模块编写了Logistic回归模型。但是,我在绘制决策边界线时非常困难。我明确地将系数和截距相乘并绘制它们(从而抛出错误的数字)。

有人可以指出正确的方向如何绘制决策边界吗?

是否有一种更简便的方法来绘制线而不必手动乘以系数和截距?

感谢百万!

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#Import Dataset
dataset = pd.read_csv("Students Exam Dataset.txt", names=["Exam 1", "Exam 2", "Admitted"])
print(dataset.head())

#Visualizing Dataset
positive = dataset[dataset["Admitted"] == 1]
negative = dataset[dataset["Admitted"] == 0]

plt.scatter(positive["Exam 1"], positive["Exam 2"], color="blue", marker="o", label="Admitted")
plt.scatter(negative["Exam 1"], negative["Exam 2"], color="red", marker="x", label="Not Admitted")
plt.title("Student Admission Plot")
plt.xlabel("Exam 1")
plt.ylabel("Exam 2")
plt.legend()
plt.plot()
plt.show()

#Preprocessing Data
col = len(dataset.columns)
x = dataset.iloc[:,0:col].values
y = dataset.iloc[:,col-1:col].values
print(f"X Shape: {x.shape}   Y Shape: {y.shape}")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1306)

#Initialize Model
reg = LogisticRegression()
reg.fit(x_train, y_train)

#Output
predictions = reg.predict(x_test)
accuracy = accuracy_score(y_test, predictions) * 100
coeff = reg.coef_
intercept = reg.intercept_
print(f"Accuracy Score : {accuracy} %")
print(f"Coefficients = {coeff}")
print(f"Intercept Coefficient = {intercept}")

#Visualizing Output
xx = np.linspace(30,100,100)
decision_boundary = (coeff[0,0] * xx + intercept.item()) / coeff[0,1]
plt.scatter(positive["Exam 1"], positive["Exam 2"], color="blue", marker="o", label="Admitted")
plt.scatter(negative["Exam 1"], negative["Exam 2"], color="red", marker="x", label="Not Admitted")
plt.plot(xx, decision_boundary, color="green", label="Decision Boundary")
plt.title("Student Admission Plot")
plt.xlabel("Exam 1")
plt.ylabel("Exam 2")
plt.legend()
plt.show()

数据集:Student Dataset.txt

1 个答案:

答案 0 :(得分:0)

  

是否有一种更简便的方法来绘制线而不必手动乘以系数和截距?

是的,如果您不需要从头开始构建它,则可以使用mlxtend包中的scikit-learn分类器绘制决策边界,这是一种出色的实现。该文档在所提供的链接中非常详尽,并且可以通过pip install mlxtend轻松安装。

首先,请注意您发布的代码的Preprocessing块:
 1. x不应包含类标签。
 2. y应该是1d数组。

#Preprocessing Data
col = len(dataset.columns)
x = dataset.iloc[:,0:col-1].values # assumes your labels are always in the final column.
y = dataset.iloc[:,col-1:col].values
y = y.reshape(-1) # convert to 1d

现在绘制就像:

from mlxtend.plotting import plot_decision_regions
plot_decision_regions(x, y,
                      X_highlight=x_test,
                      clf=reg,
                      legend=2)

这个特殊的图通过包围x_test个数据点来突出显示。

enter image description here