新数据因素中的决策树级别与原始数据不匹配

时间:2018-09-24 10:04:32

标签: r tree decision-tree

我有这个数据集:

head(mydata)
chr    start     stop strand num_probes segment_mean is_nocnv
chr18 52502759 52502887      *          2      -2.3870      YES
chr18 52508963 68598272      *       9546      -0.3843      YES
chrX 17018571 63154896      *      18479      -0.0448      YES
chrX 63161754 63812965      *        265      -0.5375      YES
chrX 63816350 66632343      *       1071       0.1047      YES
chrX 66632547 67941468      *        558      -0.5452      YES

sapply(mydata, class)
     chr        start         stop       strand   num_probes segment_mean     is_nocnv 
"factor"    "integer"    "integer"     "factor"    "integer"    "numeric"     "factor" 

我的决策树用于预测新的“样本” is_nocnvYES还是NO

我创建了预测树,然后插入了这个新示例:

chr1 <- "chr18"
start1 <- as.integer(52502759)
stop1 <- as.integer(52502887)
strand1 <- "*"
num_probes1 <- as.integer(2)
segment_mean1 <- as.numeric(-2.387)
testData <- data.frame(chr = chr1, start = start1, stop = stop1, strand = strand1, 
    num_probes = num_probes1, segment_mean = segment_mean1, is_nocnv = "")

testPred <- predict(albero, newdata = testData)
table(testPred, testData$is_nocnv)

我收到此错误:

> testPred <- predict(albero, newdata = testData)
Error in checkData(oldData, RET) : 
 Levels in factors of new data do not match original data
> table(testPred,testData$is_nocnv)
 Error in table(testPred, testData$is_nocnv) : 
  all arguments must have the same length

我用这一行修改我的代码:

testData <- data.frame(chr=chr1,start=start1,stop=stop1,strand=strand1,num_probes=num_probes1,segment_mean=segment_mean1,is_nocnv=levels=2)

即使结果不是预期的,代码也会产生结果,并返回以下错误

Errore: unexpected '=' in "testData <- data.frame(chr=chr1,start=start1,stop=stop1,strand=strand1,num_probes=num_probes1,segment_mean=segment_mean1,is_nocnv=levels="

0 个答案:

没有答案