我正在尝试在PCA转换的数据集上运行一个简单的模型。但是,当我尝试映射train.pca['outcome'] = train.df[,'outcome']
时,我收到了错误
Error in `[<-.data.frame`(`*tmp*`, "outcome", value = c(1L, 1L, 1L, 1L, :
replacement has 500 rows, data has 32000
以下是完整的错误代码。
library(xgboost)
library(readr)
library(caret)
train.raw = read.csv("file", header = TRUE, sep = ",")
drop = c('column')
train.df = train.raw[, !(names(train.raw) %in% drop)]
train.df[,'outcome'] = as.factor(train.df[,'outcome'])
train.c1 = subset(train.df , outcome == 1)
train.c0 = subset(train.df , outcome == 0)
fit.pca = prcomp(train.df[,1:100], retx = TRUE, center = T, scale = T)
summary(fit.pca)
train.pca = data.frame(fit.pca$x)
train.pca = train.pca[,1:20]
train.pca['outcome'] = train.df[,'outcome']
我想这与我如何对数据进行子集化有关,但我不确定是什么。任何帮助表示赞赏。