我有一个购物中心数据集,并且我用k = 5进行了k均值。现在我进行了线性回归后,我想打印我的预测值Y以与Y的实际值进行比较。打印实际值非常容易,但是当我尝试打印预测的Y时,我总是收到错误消息。要打印预测值,我使用了df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
。但是我收到一个错误ValueError: array length 35 does not match index length 18
。
代码:
df = pd.read_csv('D:\Mall_Customers.csv', usecols = ['Spending Score (1-100)', 'Annual Income (k$)'])
x = StandardScaler().fit_transform(df)
kmeans = KMeans(n_clusters=5, max_iter=100, random_state=0)
y_kmeans= kmeans.fit_predict(x)
df0 = df[df.index.isin(mydict[0].tolist())]
Y = df0['Spending Score (1-100)']
X = df0[[ 'Annual Income (k$)','Age']]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.5, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
r_sq = regressor.score(X, Y)
print('coefficient of determination:', r_sq)
print('intercept:', regressor.intercept_)
print('slope:', regressor.coef_)
y_pred = regressor.predict(X)
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(df)