我正在编写一个R脚本,为我的数据集多次运行Random Forest分类。我希望使用至少10次运行的平均值来获得更强大的结果。所以我有这个函数的for循环运行随机森林分类器我想多次(n =迭代)。
iterateRandomForest <- function (samples,iterations,output_text,outname,pVSURF,b) {
for (i in (1: iterations)) {
cat("\n Loop starts", "\n", file=output_text,append=TRUE)
time <- toString(Sys.time())
cat(time,"\n", file=output_text,append=TRUE)
cat("Iteration number ",i," for variable set: ", outname, "\n", sep="",file=output_text,append=TRUE)
load(pVSURF)
sel.vars <- x$varselect.pred + 1
colnames(samples[,sel.vars])
ptm <- proc.time() # Start timer to calculate processing length
(rf.final_ntree501 = randomForest(samples[,"species_na"], x=samples[,sel.vars],
ntree=b, importance=TRUE, norm.votes=TRUE, proximity=TRUE) ) # Run randomForest
### PROBLEM HERE
cat(rf.final_ntree501,file=output_text,append=TRUE)
### PROBLEM ENDS
cat("Processing time: ",proc.time() - ptm, "\n", file=output_text,append=TRUE) # Stop timer
cat("Loop ends\n", file=output_text,append=TRUE)
}
}
通常你可以写下创建的随机森林对象的名称(rf.final_ntree501)来打印结果如下:
Call:
randomForest(x = samples[, sel.vars], y = samples[, "species_na"], ntree = b, importance = TRUE, proximity = TRUE, norm.votes = TRUE)
Type of random forest: classification
Number of trees: 501
No. of variables tried at each split: 4
OOB estimate of error rate: 45.43%
Confusion matrix:
Acacia mearnsii Cupressus lusitanica Eucalyptus sp. Euphorbia sp. Ficus sp. Grevillea robusta Maesa lanceolata other Persea americana class.error
Acacia mearnsii 34 1 3 0 0 7 0 28 0 0.5342466
Cupressus lusitanica 4 3 8 0 0 13 0 16 0 0.9318182
Eucalyptus sp. 5 0 35 0 0 15 0 8 0 0.4444444
Euphorbia sp. 0 0 1 16 0 2 0 15 0 0.5294118
Ficus sp. 0 0 0 1 1 5 0 17 0 0.9583333
Grevillea robusta 5 2 3 0 1 91 0 29 1 0.3106061
Maesa lanceolata 4 0 0 0 0 2 0 14 0 1.0000000
other 16 0 3 4 1 27 1 189 1 0.2190083
Persea americana 5 1 0 0 0 6 0 33 1 0.9782609
所以我希望将这些信息写入循环内的文件中(参见。问题这里部分)。我知道我不能直接写RF对象,因为它是一个列表。如果我尝试用rf.final_ntree501 $ confusion与cat分别保存混淆矩阵。它将保存信息,但它会搞乱矩阵的表达,并将所有信息放在一行上,不包括类名。
有没有人有好的想法如何正确处理?
干杯, 拉米
答案 0 :(得分:1)
使用capture.output()
代替cat()
将结果写入文件,方式与控制台中的显示方式相同。
# generate random data
samples <- matrix(runif(675), ncol = 9)
resp <- as.factor(sample(LETTERS[1:9], 75, replace = TRUE))
# random forest
rf <- randomForest(x = samples, y = resp, ntree = 501,
importance = TRUE, norm.votes = TRUE, proximity = TRUE)
# save desired information into a file
capture.output(rf, file = output_text, append = TRUE)
单独保存混淆矩阵,您可以使用write.table()
。结果将格式化为机器可读方式,并带有选定的分隔符(示例中的选项卡)。
write.table(rf$confusion, file = "filename.txt", sep = "\t")