除了使用lm()的“姓氏”以外,我已经使用训练集中的所有预测变量拟合了多线性回归模型,现在我想基于测试集进行预测。但是,当我尝试使用预报(model.fit,test)进行此操作时,出现有关变量'lastname'的错误
我尝试传递不包含“姓氏”列的测试集,但这没用
代码:
cf_df <- read.csv(file="cap_friendly_data.csv", header=TRUE, sep=",")
new_cols <- c('lastname', 'Position', 'Age.Years', 'Original.Cap.Hit', 'New.Signing.Status', 'PPG.Prior.Signing', 'PPG.Contract.Year', 'New.Cap.Hit')
new_stats <- cf_df[, new_cols]
#create training and testing datasets
set.seed(2430)
num_training_samples <- 2000
train_indices <- sample(1:nrow(new_stats), num_training_samples, replace = FALSE,)
train <- new_stats[train_indices, ]
test <- new_stats[-train_indices, ]
test_results <- test$New.Cap.Hit
#fit model
cap.fit <- lm(New.Cap.Hit ~ . - lastname, data = train)
summary(cap.fit)
predictions <- predict(cap.fit, test)
我以为我只是从模型中获得了预测列表,但我却收到了以下错误消息:
predictions <- predict(cap.fit, test)
model.frame.default(Terms,newdata,na.action = na.action,xlev = object $ xlevels)中的错误: 姓氏因素具有新级别berg,Acciari,Acolatse,Alfredsson,Anderson,Angelidis,Arnold,Backes,Balisy,Baptiste,Barch ...
答案 0 :(得分:0)
您可以试试吗?
str(new_stats)
# remove column
new_stats = subset(new_stats, select = -c(lastname))
#create training and testing datasets
set.seed(2430)
num_training_samples <- 2000
train_indices <- sample(1:nrow(new_stats), num_training_samples, replace = FALSE,)
train <- new_stats[train_indices, ]
test <- new_stats[-train_indices, ]
test_results <- test$New.Cap.Hit
#fit model
cap.fit <- lm(New.Cap.Hit ~ ., data = train)
summary(cap.fit)
# do predictions
predictions <- predict(cap.fit, test)