来自两个输入数据集的多输出回归

时间:2018-07-16 14:13:15

标签: python statistics regression linear-regression lasso

是否可以从两个数据集YX1中对一个数据集X2进行回归分析(如果有的话) X1X2Y是矩阵。因此,这是一个多输出回归问题。

x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(x1, x2, y, test_size=0.2)
Lasso_Regr = Lasso(alpha=0.05, normalize=True)
Lasso_Regr.fit([x1_train, x2_train], y_train)
y_pred = Lasso_Regr.predict([x1_test, x2_test])

我收到以下错误:

Found array with dim 3. Estimator expected <= 2.*

X1, X2 and Y looks like this

1 个答案:

答案 0 :(得分:1)

  1. 如果分开划分训练集的预测变量,这将产生误导,因为两个预测变量之间的映射对于准确的预测是必需的。

  2. 由于已导入csv,请先对其进行转换以将其转换为垂直格式,然后转换为数据框并进行如下分析。

编辑: 示例代码:

import pandas as pd
import csv
from itertools import izip
from sklearn import linear_model, model_selection

a = izip(*csv.reader(open("input.csv", "rb")))
csv.writer(open("output.csv", "wb")).writerows(a)
df = pd.read_csv("output.csv")
print(df)

x = df[['x1', 'x2', 'x3']]
y = df['y']

x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.2)
Lasso_Regr = linear_model.Lasso(alpha=0.05, normalize=True)
Lasso_Regr.fit(x_train, y_train)
y_pred = Lasso_Regr.predict(x_test)
print y_pred

您可以添加任意数量的预测变量。