我正在尝试以下代码:
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
model = linear_model.LogisticRegression()
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
X=scaler.fit_transform(X)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
model.fit(X_train,y_train)
# Make predictions using the testing set
powerOutput_y_pred = model.predict(X_test)
print (powerOutput_y_pred)
# The coefficients
print('Coefficients: \n', model.coef_)
# The mean squared error
print("Mean squared error: %.2f"
% mean_squared_error(y_test, powerOutput_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y_test, powerOutput_y_pred))
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, powerOutput_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
但是我得到散点图的以下错误:
ValueError: x and y must be the same size
如果我运行df.head(),我会得到以下表格:
X和y的特征如下:
X=df.values[:,[0,1,2,3,4,5,7]]
y=df.values[:,6]
运行X.shape给出(25,7),y.shape给出(25,)作为输出。那么如何解决这种形状错配?
答案 0 :(得分:2)
只需使用plot
代替scatter
:
plt.plot(X_test, y_test, ls="none", marker='.', ms=12)
这将使用相同的一组y数据绘制不同的x数据集。这假设为x.shape == (n,d)
和y.shape == (n,)
,如上述问题所示。
遍历x值的列,并为每列调用scatter
一次:
colors = plt.cm.viridis(np.linspace(0.0, 1.0, features))
for xcol,c in zip(X_test.T, colors):
plt.scatter(xcol, y_test, c=c)
使用数组c
设置colors
将使得每个要素在散点图上绘制为不同的颜色。如果您希望它们都是黑色的,只需用c='black'
scatter
需要一个x值列表和一个y值列表。如果x和y列表是1D,这是最简单的。但是,如果这些数组具有匹配的形状,您还可以绘制存储在2D数组中的多组x和y数据。
从根本上说,散射适用于一维阵列; x,y,s和c可以作为二维数组输入,但在散射中它们将被展平。
有点模糊,但是dive into the Matplotlib source code确认x和y的形状必须完全匹配。处理plot
形状的代码更加灵活,因此对于该函数,您可以使用一组y数据来处理许多x数据集。
通常plot
绘制线而不是点,但您可以通过设置ls
(即linestyle
)关闭线条,然后您可以通过设置marker
来打开点。 ms
(即markersize
)控制点的大小。
您在上面发布的示例不会运行(X
且y
未定义),但这是输出的完整示例:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from sklearn import datasets
from sklearn.model_selection import train_test_split
d = datasets.load_diabetes()
features = d.data.shape[1]
X = d.data[:50,:]
Y = d.target[:50]
sample_weight = np.random.RandomState(442).rand(Y.shape[0])
# split train, test for calibration
X_train, X_test, Y_train, Y_test, sw_train, sw_test = \
train_test_split(X, Y, sample_weight, test_size=0.9, random_state=442)
# use the plot function instead of scatter
# plot one set of y data against several sets of x data
plt.plot(X_test, Y_test, ls="none", marker='.', ms=12)
# call .scatter() multiple times in a loop
#colors = plt.cm.viridis(np.linspace(0.0, 1.0, features))
#for xcol,c in zip(X_test.T, colors):
# plt.scatter(xcol, Y_test, c=c)
输出: