我正在开发一种新型的代码,在岭正则化回归方面需要一些帮助。试图建立一个预测模型,但首先我需要x和y矩阵行来匹配。
我发现与Google搜索类似,但它们的数据是随机生成的,没有像我的一样提供。数据是一个大型数据集,包含超过500,000个观察值和670个变量。
library(rsample)
library(glmnet)
library(dplyr)
library(ggplot2)
# Create training (70%) and test (30%) sets
# Use set.seed for reproducibility
set.seed(123)
alumni_split<-initial_split(alumni, prop=.7, strata = "Id.Number")
alumni_train<-training(alumni_split)
alumni_test<-testing(alumni_split)
#----
# Create training and testing feature model matrices and response
vectors.
# we use model.matrix(...)[, -1] to discard the intercept
alumni_train_x <- model.matrix(Id.Number ~ ., alumni_train)[, -1]
alumni_test_x <- model.matrix(Id.Number ~ ., alumni_test)[, -1]
alumni_train_y <- log(alumni_train$Id.Number)
alumni_test_y <- log(alumni_test$Id.Number)
# What is the dimension of of your feature matrix?
dim(alumni_train_x)
#---- [HERE]
# Apply Ridge regression to alumni data
alumni_ridge <- glmnet(alumni_train_x, alumni_train_y, alpha = 0)
错误消息(带有代码):
alumni_ridge <-glmnet(alumni_train_x,alumni_train_y,alpha = 0) glmnet中的错误(alumni_train_x,alumni_train_y,alpha = 0): y中的观察次数(329870)不等于的行数 x(294648)