使用带有klaR包的预测时出错,NaiveBayes

时间:2013-01-16 06:01:57

标签: r bayesian predict

我正在使用帖子Naive bayes in R中提到的klaR包的predict方法:

nb_testpred <- predict(mynb, newdata=testdata).

nb_testpred是我的朴素贝叶斯模型,是在traindata上开发的; testdata是剩余的数据。

然而,我收到此错误:

Error in FUN(1:10[[4L]], ...) : subscript out of bounds

我不确定发生了什么 - testdata的行数少于traindata,列数相同。

作为参考,我的代码如下所示:

ind       <- sample(2, nrow(mydata), replace=TRUE, prob=c(0.9,0.1))
traindata <- mydata[ind==1,]
testdata  <- mydata[ind==2,]
myformula <- as.factor(dep) ~ X1 + as.factor(X2) + as.factor(X3) + as.factor(X4) + X5 + as.factor(X6) + as.factor(date) + as.factor(hour)
mynb        <- NaiveBayes(myformula, data=traindata)
nb_testpred <- predict(mynb, newdata=testdata) #where I'm getting an error...

这里有一个数据样本(原始文件有100,000多行):

sampledata <- structure(list(dep = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), X1 = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("A", "B"), class = "factor"), X2 = c(200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 
200L, 200L), X3 = structure(c(4L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c(".", "1400000", "2400000", "900000"), class = "factor"), X4 = c(0L, 0L, 0L, 3L, 4L, 5L, 5L, 5L, 5L, 0L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 0L), X5 = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), X6 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),     date = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("9/23/2012", 
"9/24/2012"), class = "factor"), hour = c(18L, 17L, 23L, 8L, 1L, 19L, 19L, 16L, 22L, 2L, 12L, 16L, 15L, 9L, 1L, 9L, 
13L, 19L)), .Names = c("dep", "X1", "X2", "X3", "X4", "X5", "X6", "date", "hour"), class = "data.frame", row.names = c(NA, -18L))

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:0)

您可以按以下方式行事:

traindata$dep=factor(traindata$dep)
mynb <- NaiveBayes(dep~.,traindata)

然后它可以工作,但是你应该优化你的数据以避免使用常量列。