在运行randomForest时保留一列NA

时间:2013-07-24 16:29:17

标签: r na

很抱歉,如果标题写得不好,请让我先解释一下我的问题,并且我对R不太熟悉。

我正在运行一个脚本,需要randomForest来训练一组数据(下面都有)。但是,此数据集具有一列NA(Call_vs_Noise)。我需要保留此列,但我无法解决我认为NAs正在创建的错误。注意:R脚本本身很长,我没有发布整个事情。

导致我出现问题的代码部分:

    my.perf = nfold.xval(events[events$Random.Percent <= i, ], 
    fold=10, annotation=paste(i, "% Bootstrap Sample", sep=""))[[1]]
perf[[i]] = rejigger.perf(my.perf)
op.2 = calculate.operating.parameters(perf[[i]], method="confidence.range")$op
op.3 = calculate.operating.parameters(perf[[i]], method="frequency")$op
op[i] = ifelse(op.2 < op.3, op.2, op.3)
bs.noise = dim(events[events$Random.Percent <= i & events$Call_vs_Noise == "Noise",])[1]
bs.call = dim(events[events$Random.Percent <= i & events$Call_vs_Noise == "Call",])[1]
print(paste("Bootstrap sample:", bs.noise, "noise,", bs.call, "call."))

splits = splitdf(events[events$Random.Percent <= i, ], weight=7/10)
rf = randomForest(Call_vs_Noise ~ ., data = splits$trainset[, c(seewave.measures, "Call_vs_Noise", "Detector")], 
                  sampsize = 0.99 * nrow(splits$trainset))

我的数据格式:

Call_vs_Noise  NA NA NA NA NA ....
Detector       Thrush Thrush Thrush Thrush Thrush....
Selection      1 1 1 1 1 ....  
Begin.Time..s. 2.101 2.183 2.746 3.463  3.689             
End.Time..s.   2.215 2.798 2.984 3.593  4.008                      

... I have 45 variables and 32456 observations
Call_vs_Noise is a column of NAs

我收到的错误:

Error in randomForest.default(m, y, ...) : data (x) has 0 rows
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values.  Are you sure you want to do regression?

根据我在网上收集的内容,我的数据中的NA似乎导致了这个错误。但是,如果我通过删除行删除NA,我基本上删除了所有数据。有没有解决的办法?

0 个答案:

没有答案