通过matplotlib绘制多重自变量的回归残差

时间:2017-03-22 17:59:40

标签: python matplotlib machine-learning scikit-learn regression

我想知道如何为多个自变量绘制残差?我的数据集有49个功能,它有2251行。我的目标变量是从0到1的数字,所以我使用的是回归。我使用了一个特征选择方法来选择前10个最重要的特征,所以我没有使用48个独立变量,而是要关注10.问题:我不知道如何绘制以显示10个独立变量的残差

我的功能选择alogrithm能够选择以下10个功能:

'Dec','Fog-Rain','Max_Sea_Level_PressureIn','Mean_Sea_Level_PressureIn','Min_Sea_Level_PressureIn','NormalizedCC','Outlier_CC_D','Rain','Snow_flg','Rain-Thunderstorm'

我想绘制这10个特征的残差。我想制作10个不同的图/数字,或者为了在一个图表中绘制10个独立变量,我想让每个变量成为不同的颜色图。因此,而不是X2_test(由于它是10乘2251而不是1乘2251而不起作用)我希望

X2_test['One of the 10 most important columns']

例如,当我尝试这样做时:

import pandas as pd 
import numpy as np
from pandas import read_csv
from sklearn.feature_selection import RFE
from sklearn import linear_model
from sklearn import preprocessing
from matplotlib import pyplot
import statsmodels.api as sm
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import explained_variance_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import median_absolute_error
from sklearn.metrics import r2_score

df = pd.read_csv('Frequency_Data.csv')
'''
New model
'''
model2 = linear_model.LinearRegression()
train= df[:735]
test = df[735:]
X2_train=np.array(train[['Dec','Fog-Rain','Max_Sea_Level_PressureIn','Mean_Sea_Level_PressureIn','Min_Sea_Level_PressureIn','NormalizedCC','Outlier_CC_D','Rain','Snow_flg','Rain-Thunderstorm']])
X2_test=np.array(test[['Dec','Fog-Rain','Max_Sea_Level_PressureIn','Mean_Sea_Level_PressureIn','Min_Sea_Level_PressureIn','NormalizedCC','Outlier_CC_D','Rain','Snow_flg','Rain-Thunderstorm']])
y2_train = np.array(train['freq'])
y2_test = np.array(test['freq'])
model2.fit(X2_train, y2_train)
y2_pred = model2.predict(X2_test)

print('\n Accuracy')
print('-------------------------------------------------------------------')     
print 'Regression Accuracy: '+str(model2.score(X2_test,y2_test))

print('\n Model Evaluation:')
print('-------------------------------------------------------------------')
print'Explained Variance: ' + str(explained_variance_score(y2_test, y2_pred))
print'Mean Absloute sq Error: '+str(mean_absolute_error(y2_test, y2_pred))
print'Mean sq Error: '+str(mean_squared_error(y2_test, y2_pred))
print'Median Absolute Error: '+str(median_absolute_error(y2_test, y2_pred))
#print'R2: '+str(r2_score(y2_test, y2_pred))
error = y_test - y2_pred
import matplotlib.pyplot as plt
# Plot outputs
plt.scatter(X2_test['Dec'], y2_test,  color='black')
plt.plot(X2_test['Dec'], model2.predict(X2_test['Dec']), color='blue',
     linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

是10个重要的功能,当我尝试绘制其中一个Dec时,它给了我以下错误:

ValueError: operands could not be broadcast together with shapes (563,) (1516,) 

如何将10种不同的特征绘制成不同的颜色,然后设置一个图例,让用户知道哪种颜色与哪个特征相对应。或者我如何生成10个不同的图/数字来显示残差?或者是否有另一种方法来评估10个变量中每个变量的错误?我的数据集是每日频率,因此我想放大以查看哪个时间段创建了最大的错误

0 个答案:

没有答案