Question

默认情况下，提升回归树（包gbm）的R实现如何处理预测变量的缺失值？根据哪种算法，它们是否被估算，如果是，

我的问题背景：差不多一年前我做了分析，并使用了Elith等人提供的脚本。 2008年（促进回归树的工作指南，Journal of Animal Ecology 77,802-813）来调用gbm。我现在意识到我有一些预测变量的NA，我想知道增强的回归树如何处理它们。通过浏览各种手册和论文，我发现了诸如“提升回归树可以容纳缺失值”等语句，但我无法找到gbm对缺失值做什么的准确描述。分析本身没有问题，因此gbm必须以一种或另一种方式处理它们。在gbm手册中，甚至有一个例子，其中引入了故意的NA来证明gbm可以毫无问题地继续工作。现在我想知道gbm对NAs的确切作用（跳过它们，归咎于它们，......？）。

Answer 1

gbm函数可用于插补as described in Jeffrey Wongs blog:。缺少值会产生替代分裂，然后用户可以获得具有未完成预测变量集的iems的预测。

他基于这种方法开发了一个包。 GitHub repo在头文件中有这个用于gbm：

的文件之一

#' GBM Imputation
#'
#' Imputation using Boosted Trees
#' Fill each column by treating it as a regression problem. For each
#' column i, use boosted regression trees to predict i using all other
#' columns except i. If the predictor variables also contain missing data,
#' the gbm function will itself use surrogate variables as substitutes for the predictors.
#' This imputation function can handle both categorical and numeric data.

为了找到这个，我只是将其输入Google搜索：gbm如何处理缺失的值。这对我来说是第二次打击。

https://github.com/jeffwong/imputation/blob/master/R/gbmImpute.R

R：如何提升回归树处理丢失的数据？

1 个答案: