在行diabetes_x = diabetes.data[:, np.newaxis, 0]
的行中,它从该糖尿病数据结构中选择20个特征中的一个作为向量。
这个x向量的一部分被送到机器学习函数,另一部分用于测试。
最后,它将这些测试x向量提供给预测函数以获得y_predict=regr.predict(diabetes_x_train)
。
问题是:你如何循环这个,以便你获得矩阵而不是向量来填充所有20个特征?
例如:
diabetes_x
是mxn
=> diabetes_x_train
,diabetes_x_test
为mxn
y_predict
为mxn
这是python代码:
from sklearn import linear_model # Machine Learning tool
import numpy as np # Mathematics and Linear Algebra tool
import pandas as pd # data structure tool
import matplotlib.pyplot as plt # scientific plotting tool
import seaborn as sns # # scientific plotting tool
%matplotlib inline
### Linear Regression example
from sklearn import datasets, linear_model
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_x = diabetes.data[:, np.newaxis, 0] # change 0 to something else for other features
# Split the data into training/testing sets
diabetes_x_train = diabetes_x[:-20]
diabetes_x_test = diabetes_x[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_x_train, diabetes_y_train)
# Obtain prediction based on previous experience
y_predict=regr.predict(diabetes_x_train)
答案 0 :(得分:1)
你可以尝试这个来选择所有10个功能(至少有10个功能,至少我在Windows上使用的sklearn 0.18.1
和python 2.7
):
diabetes_x = diabetes.data[:, range(10)] # select 10 features
# Split the data into training/testing sets
diabetes_x_train = diabetes_x[:-20,:]
print diabetes_x_train.shape
# (422L, 10L)
diabetes_x_test = diabetes_x[-20:,:]
print diabetes_x_test.shape
# (20L, 10L)
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_x_train, diabetes_y_train)
# Obtain prediction based on previous experience
y_predict=regr.predict(diabetes_x_test)
#print y_predict
plt.plot(range(len(diabetes.target)), diabetes.target, 'r-', label='test')
plt.plot(range(len(diabetes_y_train), len(diabetes.target)), y_predict, 'g-', label='predict')
plt.legend()
plt.show()