Question

我正在修改随机森林模型中的功能，不知怎的，我发现大量实例被错误分类，我怎样才能找出那些错误分类的用户ID？

  fit1 <- cforest((b == 'three')~   affect+ certain+ negemo+ future+swear+sad
            +negate+ppron+sexual+death + filler+leisure + conj+ funct + i
            +future + past + bio + body+cause + cogmech + death +
            discrep + future +incl + motion + quant + sad + tentat + excl+insight +percept +posemo
            +ppron +quant + relativ + space + article
            , data = trainset1, 
            controls=cforest_unbiased(ntree=1000, mtry= 1))

 table1 <- table(predict(fit1, OOB=TRUE, type = 'response') > 0.5, trainset1$b == 'three')

结果

        FALSE TRUE
 FALSE   213  200
 TRUE    821 1121

结果显示，其他类中有821个被错误分类为＆＃34;三个＆＃34;，如何根据用户ID检索这821个案例，以便我可以比较它们的功能。谢谢。

Answer 1

因此，您希望获取一些已用于创建该表的代码，并使用它来挑选放在表左下角的行。

所以这是使你的表工作的代码：

predict(fit1, OOB=TRUE, type = 'response') > 0.5, trainset1$b == 'three'

如果您运行第一部分，您将获得所有预测的向量：

p<-predict(fit1, OOB=TRUE, type = 'response')

如果您然后应用＆gt; 0.5阈值，您将获得一个TRUE和FALSE向量，表示您的预测是高于还是低于该阈值：

tf<- p>0.5

现在，最后一部分是提供另一个TRUE和FALSE值的向量，trainset1 $ b ==“three”。而你想知道的是哪些行被归类为“三”（我认为tf为真，即p> 0.5）但实际上并不是“三”类（从问题trainset1 $ b = FALSE = = “3”）。要解决这个问题，你需要所有行，其中tf == TRUE和trainset1 $ b！=“三”：

newdata<- trainset1[tf==TRUE & trainset1$b!="three",]

再仔细检查一下nrow（newdata）是821.

从随机林结果中检索实例

1 个答案: