尝试拟合多个线性回归模型时Predict()函数的问题

时间:2019-07-22 21:10:35

标签: r linear-regression

除了使用lm()的“姓氏”以外,我已经使用训练集中的所有预测变量拟合了多线性回归模型,现在我想基于测试集进行预测。但是,当我尝试使用预报(model.fit,test)进行此操作时,出现有关变量'lastname'的错误

我尝试传递不包含“姓氏”列的测试集,但这没用

代码:

cf_df <- read.csv(file="cap_friendly_data.csv", header=TRUE, sep=",")

new_cols <- c('lastname', 'Position', 'Age.Years', 'Original.Cap.Hit', 'New.Signing.Status', 'PPG.Prior.Signing', 'PPG.Contract.Year', 'New.Cap.Hit')

new_stats <- cf_df[, new_cols]

#create training and testing datasets
set.seed(2430)
num_training_samples <- 2000
train_indices <- sample(1:nrow(new_stats), num_training_samples,  replace = FALSE,)
train <- new_stats[train_indices, ]
test <- new_stats[-train_indices, ]
test_results <- test$New.Cap.Hit

#fit model
cap.fit <- lm(New.Cap.Hit ~ . - lastname, data = train)
summary(cap.fit)

predictions <- predict(cap.fit, test)

我以为我只是从模型中获得了预测列表,但我却收到了以下错误消息:

predictions <- predict(cap.fit, test)

model.frame.default(Terms,newdata,na.action = na.action,xlev = object $ xlevels)中的错误:   姓氏因素具有新级别berg,Acciari,Acolatse,Alfredsson,Anderson,Angelidis,Arnold,Backes,Balisy,Baptiste,Barch ...

1 个答案:

答案 0 :(得分:0)

您可以试试吗?

str(new_stats)

# remove column
new_stats = subset(new_stats, select = -c(lastname))

#create training and testing datasets
set.seed(2430)
num_training_samples <- 2000
train_indices <- sample(1:nrow(new_stats), num_training_samples,  replace = FALSE,)
train <- new_stats[train_indices, ]
test <- new_stats[-train_indices, ]
test_results <- test$New.Cap.Hit

#fit model
cap.fit <- lm(New.Cap.Hit ~ ., data = train)
summary(cap.fit)

# do predictions
predictions <- predict(cap.fit, test)