对于noob问题抱歉...这是我的代码:
from __future__ import division
import sklearn
import numpy as np
from scipy import stats
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
X =np.array([6,8,10,14,18])
Y = np.array([7,9,13,17.5,18])
X = np.reshape(X,(1,5))
Y = np.reshape(Y,(1,5))
print X
print Y
plt.figure()
plt.title('Pizza Price as a function of Pizza Diameter')
plt.xlabel('Pizza Diameter (Inches)')
plt.ylabel('Pizza Price (Dollars)')
axis = plt.axis([0, 25, 0 ,25])
m, b = np.polyfit(X,Y,1)
plt.grid(True)
plt.plot(X,Y, 'k.')
plt.plot(X, m*X + b, '-')
#plt.show()
#training data
#x= [[6],[8],[10],[14],[18]]
#y= [[7],[9],[13],[17.5],[18]]
# create and fit linear regression model
model = LinearRegression()
model.fit(X,Y)
print 'A 12" pizza should cost $% .2f' % model.predict(19)
#work out cost function, which is residual sum of squares
print 'Residual sum of squares: %.2f' % np.mean((model.predict(x)- y) ** 2)
#work out variance (AKA Mean squared error)
xMean = np.mean(x)
print 'Variance is: %.2f' %np.var([x], ddof=1)
#work out covariance (this is whether the x axis data and y axis data correlate with eachother)
#When a and b are 1-dimensional sequences, numpy.cov(x,y)[0][1] calculates covariance
print 'Covariance is: %.2f' %np.cov(X, Y, ddof = 1)[0][1]
#test the model on new test data, printing the r squared coefficient
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
print 'R squared for model on test data is: %.2f' %model.score(X_test,y_test)
基本上,这些函数中的一些适用于我称为X和Y的变量,而有些函数则不适用。
例如,正如代码所示,它会抛出此错误:
TypeError: expected 1D vector for x
为行
m, b = np.polyfit(X,Y,1)
然而,当我注释掉这两行重塑这样的变量时:
#X = np.reshape(X,(1,5))
#Y = np.reshape(Y,(1,5))
我收到错误:
ValueError: Found input variables with inconsistent numbers of samples: [1, 5]
就行了
model.fit(X,Y)
那么,我如何让数组适用于我的脚本中的所有函数,而不会有相同数据的不同数组,结构略有不同?
感谢您的帮助!