Question

我预先处理了数据集并检查了自变量的可能的多重共线性。

数据集有6列31行，我用它生成1/3作为X_test和y_test，剩下的是X_train和y_train。

我使用sklearn.linear_model LinearRegression函数将X_train和y_train拟合到回归量，并使用X_test的预测函数给出了y的预测值。

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('daily_raw_status.csv')
X = dataset.iloc[:, :-1].values # IVs
y = dataset.iloc[:, 6].values # DV

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

# Fitting MLR to the Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression() # create object
regressor.fit(X_train, y_train) # using fit method, fit the multiple regressor to training set

# Predicting the Test set results
y_pred = regressor.predict(X_test)
Now that I have the y_pred, I can now check the y_pred to the y_test if it's nearly the same.

问题是：

我还能用y_pred做些什么，或者我应该把重点放在解释模型上？以及我如何能够将模型重新用于可能的实时数据集的任何想法/概念？

谢谢！

Answer 1

此外，你可以这样做：

interpret the beta coefficients and intersect
Find the RMSE or MAE to check the error
如果RMSE或MAE高：离群值处理或feature selection (find potential predictors)

Answer 2

解释回归模型的一个非常典型的步骤是ANOVA分析。这种常见的分析类型使您可以评估整体模型的重要性，系数的重要性和大小R²等。有关使用“ statsmodels”包的示例，请参见ANOVA example。 Statsmodels通常提供更多的工具来解释和评估回归模型。

要查看模型是否适合预测，您可能需要检查预测质量是否满足您的目的。这可能包括

找到合适的指标
比较火车和测试仪的性能
检查值的范围

从实用的角度来看，通常可以绘制预测值和实际值以了解您的预测质量。

预测y后要做什么？

2 个答案: