Question

回归线是否拟合得不好，如果是，我该怎么做才能获得准确的结果？我还无法识别出诸如回归线是否过拟合或过拟合或准确之类的事情，因此也将对这些建议提出建议。文件“ Advertising.csv”：-https://github.com/marcopeix/ISL-linear-regression/tree/master/data

#Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error

#reading and knowing the data
data=pd.read_csv('Advertising.csv')
#print(data.head())
#print(data.columns)
#print(data.shape)

#plotting the data
plt.figure(figsize=(10,8))
plt.scatter(data['TV'],data['sales'], c='black')
plt.xlabel('Money Spent on TV ads')
plt.ylabel('Sales')
plt.show()

#storing data into variable and shaping data
X=data['TV'].values.reshape(-1,1)
Y=data['sales'].values.reshape(-1,1)

#calling the model and fitting the model
reg=LinearRegression()
reg.fit(X,Y)

#making predictions
predictions=reg.predict(X)

#plotting the predicted data
plt.figure(figsize=(16,8))
plt.scatter(data['TV'],data['sales'], c='black')
plt.plot(data['TV'],predictions, c='blue',linewidth=2)
plt.xlabel('Money Spent on TV ads')
plt.ylabel('Sales')
plt.show()

r2= r2_score(Y,predictions)
print("R2 score is: ",r2)
print("Accuracy: {:.2f}".format(reg.score(X,Y)))

Answer 1

要确定模型是否拟合不足（或拟合过度），您需要查看模型的偏差（模型预测的输出与预期输出之间的距离）。（据我所知）您不能仅仅通过查看代码来做到这一点，还需要评估模型（运行它）。

由于它是线性回归，很可能您拟合不足。

我建议将您的数据分为训练集和测试集。您可以将模型拟合到训练集上，并使用测试集查看模型在看不见的数据上的表现如何。如果模型在训练数据和测试数据上均表现不佳，则表明模型不合适。如果它在训练数据上表现出色，但在测试数据上表现不佳，那就太适合了。

尝试以下方法：

from sklearn.model_selection import train_test_split

# This will split the data into a train set and a test set, leaving 20% (the test_size parameter) for testing
X, X_test, Y, Y_test = train_test_split(data['TV'].values.reshape(-1,1), data['sales'].values.reshape(-1,1), test_size=0.2)

# Then fit your model ...
# e.g. reg.fit(X,Y)

# Finally evaluate how well it does on the training and test data.
print("Test score " + str(reg.score(X_test, Y_test)))
print("Train score " + str(reg.score(X_test, Y_test)))

Answer 2

代替对相同数据的培训和测试。将您的数据集分为2,3组（训练，验证，测试）您可能只需要将其拆分为2个（train，test）即可，使用sklearn库函数train_test_split 根据训练数据训练模型。然后对测试数据进行测试，看看您是否获得了良好的结果。如果模型的训练精度很高，但测试的效率很低，那么您可能会说它过拟合。或者，如果模型甚至无法在火车上获得很高的精度，则说明该模型不合适。希望你能。：）

数据不合身吗？

2 个答案: