由于我的数据不是正态分布,因此我决定在将它们分割成 X_train、X_test、y_train 和 y_test 之前使用 PowerTransformer 对 X 和 y 进行转换。这样做是否可以?如果应该在稍后执行转换,我应该怎么做?下面是我的代码:
X = df[['Aces', 'TotalPointsWon', 'ServiceGamesWon', 'TotalServicePointsWon']]
y = df[['Winnings']]
transformer_X = PowerTransformer()
X_log = transformer_X.fit_transform(X)
transformer_y = PowerTransformer()
y_log = transformer_y.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X_log, y_log, train_size=0.8)
scaler = StandardScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)
model = LinearRegression()
model.fit(X_train, y_train)
谢谢你的帮助。