很抱歉,如果标题写得不好,请让我先解释一下我的问题,并且我对R不太熟悉。
我正在运行一个脚本,需要randomForest来训练一组数据(下面都有)。但是,此数据集具有一列NA(Call_vs_Noise)。我需要保留此列,但我无法解决我认为NAs正在创建的错误。注意:R脚本本身很长,我没有发布整个事情。
导致我出现问题的代码部分:
my.perf = nfold.xval(events[events$Random.Percent <= i, ],
fold=10, annotation=paste(i, "% Bootstrap Sample", sep=""))[[1]]
perf[[i]] = rejigger.perf(my.perf)
op.2 = calculate.operating.parameters(perf[[i]], method="confidence.range")$op
op.3 = calculate.operating.parameters(perf[[i]], method="frequency")$op
op[i] = ifelse(op.2 < op.3, op.2, op.3)
bs.noise = dim(events[events$Random.Percent <= i & events$Call_vs_Noise == "Noise",])[1]
bs.call = dim(events[events$Random.Percent <= i & events$Call_vs_Noise == "Call",])[1]
print(paste("Bootstrap sample:", bs.noise, "noise,", bs.call, "call."))
splits = splitdf(events[events$Random.Percent <= i, ], weight=7/10)
rf = randomForest(Call_vs_Noise ~ ., data = splits$trainset[, c(seewave.measures, "Call_vs_Noise", "Detector")],
sampsize = 0.99 * nrow(splits$trainset))
我的数据格式:
Call_vs_Noise NA NA NA NA NA ....
Detector Thrush Thrush Thrush Thrush Thrush....
Selection 1 1 1 1 1 ....
Begin.Time..s. 2.101 2.183 2.746 3.463 3.689
End.Time..s. 2.215 2.798 2.984 3.593 4.008
... I have 45 variables and 32456 observations
Call_vs_Noise is a column of NAs
我收到的错误:
Error in randomForest.default(m, y, ...) : data (x) has 0 rows
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
根据我在网上收集的内容,我的数据中的NA似乎导致了这个错误。但是,如果我通过删除行删除NA,我基本上删除了所有数据。有没有解决的办法?