这似乎是一个非常明显的问题,但我已经通过代码查看了它似乎很好。挑战在于,当逐行运行时代码运行正常,但是当使用针对R降价文档编织时,它会选择错误的随机森林并打印出重要性。
我尝试过重新安装knitr但是没有用。
数据基于火车泰坦dataset
我有2个模型,一个叫modRF,另一个叫mod2。我想在mod 2上运行图表,但输出是modRF
你可以通过改变线来看到这一点
imp<-importance(mod2$finalModel)
为modRF $ ...
就像我说的那样,当我逐行运行这个代码时,它起作用,在Rmarkdown(编织到HTML)中它会生成错误的图表。有人可以详细说明吗?
PS随机森林模型在我的机器上运行每个模型只需不到一分钟,因此运行此代码不应该花费太长时间。
提前感谢您的帮助,
J
这是我要复制的代码
suppressMessages(library(caret))
suppressMessages(library(randomForest))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
setwd("~/Kaggle/Titanic")
totaltrain<-read.csv("train.csv")
#Adding features for EDA
totaltrain$CabinYes<-as.numeric(!(totaltrain$Cabin)=="")
ageid<-data.frame("minage"=c(0,20,30,40,50,60),
"AgeLabel"=c("Under 20","20-30","30-40","40-50","50-60","60+"))
#vlookup TRUE equivalent
totaltrain$AgeBracket<-ageid[findInterval(totaltrain$Age,ageid$minage),2]
#findInterval creates an index of which of the initial values most closely matches
#the lookup... Then use with the age id index and return the second column
a<-c(1,2,3,5,7,8,12,13,14)
rates<-totaltrain[,a]
rates$AgeBracket<-as.character(rates$AgeBracket)
rates$AgeBracket[is.na(rates$AgeBracket)]<-"Unknown"
rates$AgeBracket<-as.factor(rates$AgeBracket)
rates$Survived<-as.factor(rates$Survived)
rates$Pclass<-as.factor(rates$Pclass)
rates$CabinYes<-as.factor(rates$CabinYes
```{r,cache=TRUE}
set.seed(4321)
inTrain <- createDataPartition(y=rates$Survived,
p=0.75, list=FALSE)
training<-rates[inTrain,]
testing<-rates[-inTrain,]
modRF<-train(Survived~.-PassengerId,data=training,method="rf",trControl=
trainControl(method="cv",number = 3,
allowParallel = T,))
pred<-predict(modRF,newdata=testing)
testing$PredRight<-pred==testing$Survived
sum(testing$PredRight)/length(pred)
```
b<-c(1,2,3,5,6,7,8,12,13)
rates2<-totaltrain[,b]
rates2$Age[is.na(rates2$Age)]<-0
#Model 2
set.seed(2072)
inTrain <- createDataPartition(y=rates$Survived,
p=0.75, list=FALSE)
training<-rates[inTrain,]
testing<-rates[-inTrain,]
mod2<-train(Survived~.-PassengerId,data=training,method="rf",trControl=
trainControl(method="cv",number = 3,
allowParallel = T,))
imp<-importance(mod2$finalModel)
impdf<-data.frame(Variables=row.names(imp),Importance=round(imp[,1],2))
rankimp<-impdf %>% mutate(Rank = paste0('#',dense_rank(-Importance)))
ggplot(rankimp, aes(x = reorder(Variables, Importance),
y = Importance, fill = Importance)) +
geom_bar(stat='identity') +
geom_text(aes(x = Variables, y = 0.5, label = Rank),
hjust=0, vjust=0.55, size = 4, colour = 'red') +
labs(x = 'Variables') +
coord_flip()