我有以下几列:Col1:字符串,Col2:浮动,Col3:浮动。在预测期间,我要预测Col3
的值:
import pickle
import numpy as np
from sklearn import model_selection
from sklearn import linear_model
from sklearn.preprocessing import OneHotEncoder
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
x_columns_to_encode = ['Col1']
x_columns_to_scale = ['Col2']
y_columns_to_scale = ['Col3']
# Instantiate encoder/scaler
scaler = StandardScaler()
ohe = OneHotEncoder(sparse=False)
# Scale and Encode Separate Columns
x_scaled_columns = scaler.fit_transform(df1[x_columns_to_scale])
x_encoded_columns = ohe.fit_transform(df1[x_columns_to_encode])
y_scaled_columns = scaler.fit_transform(df1[y_columns_to_scale])
df = np.concatenate([x_scaled_columns, x_encoded_columns], axis=1)
validation_size = 0.50
seed = 7
x_train, x_validation, y_train, y_validation = model_selection.train_test_split(df, y_scaled_columns, test_size=validation_size, random_state=seed)
bestScore = 0.0
model = linear_model.LinearRegression()
score = model.fit(x_train, y_train).score(x_validation, y_validation)
print(score)
运行此代码时出现错误:
“无法分配形状为(2763330,25380)和数据类型为float64的数组”
有人可以帮助我了解我在哪里犯错吗?
答案 0 :(得分:0)
一种热编码会为分类列中的每个唯一类生成一个新列。如果您的分类列中的唯一类过多,则可能会耗尽内存。
这可能有助于向我们展示您的数据,因此我们可以提供更好的建议。
在此期间,您可以尝试以下选项: