Python中的多个线性回归机器学习--ValueError:(8,15)和(390,)形状未对齐

时间:2019-12-16 12:51:15

标签: machine-learning regression

我正在尝试使用多元线性回归机器学习基于某些输入来评估输出。我已经训练了数据并在运行以下代码时获得了正确的期望值:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


# Importing the dataset
#dataset = pd.read_csv('50_Startups.csv')
dataset = pd.read_excel('MAHI2.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 0] = labelencoder.fit_transform(X[:, 0])

labelencoder1 = LabelEncoder()
X[:, 1] = labelencoder.fit_transform(X[:, 1])

labelencoder2 = LabelEncoder()
X[:, 2] = labelencoder.fit_transform(X[:, 2])

labelencoder3 = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])

onehotencoder = OneHotEncoder(categorical_features = "all")
#X = onehotencoder.fit_transform(X).toarray()
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap
X = X[:, 1:]


from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

regressor.fit(X, y)
y_pred = regressor.predict(X)
df = pd.DataFrame({'Actual': y.flatten(), 'Predicted': y_pred.flatten()})
df

现在我正尝试使用相同的模型来评估另一组输入数据,如下所示:

dataset1 = pd.read_excel('MAHI3.xlsx')
#dataset2 = pd.get_dummies(dataset1)
X1 = dataset1.iloc[:, :-1].values
y2 = dataset1.iloc[:, 5].values                 

# Encoding categorical data
#labelencoder3 = LabelEncoder()
X1[:, 0] = labelencoder.fit_transform(X1[:, 0])

#labelencoder4 = LabelEncoder()
X1[:, 1] = labelencoder.fit_transform(X1[:, 1])

#labelencoder5 = LabelEncoder()
X1[:, 2] = labelencoder.fit_transform(X1[:, 2])

#labelencoder6 = LabelEncoder()
X1[:, 3] = labelencoder.fit_transform(X1[:, 3])

#onehotencoder2 = OneHotEncoder(categorical_features = "all")
X1 = onehotencoder.fit_transform(X1).toarray()

output = regressor.predict(X1)

df1 = pd.DataFrame({'Actual1': y2.flatten(), 'Predicted1': output.flatten()})
df1

但是当我运行此代码时,出现以下错误: ValueError: shapes (6,13) and (390,) not aligned: 13 (dim 1) != 390 (dim 0) 如果有人帮助我解决这个问题,那将是很好的。

1 个答案:

答案 0 :(得分:0)

我无权访问您的数据集,但看来您的问题是维度问题。似乎会改变尺寸的是“ onehotencoder”。

尝试为两者使用相同的一个热编码器。

ohe = onehotencoder.fit(X)
X = ohe.transform(X).toarray()
X1 = ohe.transform(X1).toarray()

您应确保“回归”模型接收的功能数量与训练时的数量相同。