我试图在Python上运行多元线性回归/套索/里奇回归,为此,我必须将数据拆分为训练和测试。但是,当我尝试执行此操作时,出现以下错误消息“ ValueError:无法将字符串转换为float:'阿塞拜疆'”。我的数据集包含每个国家的GDP增长值。这个国家的GDP增长值似乎与其他国家没有区别,因此我不理解该错误背后的原因。 EF1是我的数据集的名称,我的代码如下:
X = EF1.iloc[:, :-1]
#output
Y = EF1.iloc[:, -1]
x_train, x_test, y_train, y_test = train_test_split(
EF1.iloc[:, :-1], EF1.iloc[:, -1],
test_size = 0.25)
print("Train data shape of X = % s and Y = % s : "%(
x_train.shape, y_train.shape))
print("Test data shape of X = % s and Y = % s : "%(
x_test.shape, y_test.shape))
# Apply multiple Linear Regression Model
lreg = LinearRegression()
lreg.fit(x_train, y_train)
# Generate Prediction on test set
lreg_y_pred = lreg.predict(x_test)
# calculating Mean Squared Error (mse)
mean_squared_error = np.mean((lreg_y_pred - y_test)**2)
print("Mean squared Error on test set : ", mean_squared_error)
# Putting together the coefficient and their corrsponding variable names
lreg_coefficient = pd.DataFrame()
lreg_coefficient["Columns"] = x_train.columns
lreg_coefficient['Coefficient Estimate'] = pd.Series(lreg.coef_)
print(lreg_coefficient) ```