具有许多功能的R xgboost重要性图

时间:2016-09-07 13:49:26

标签: r ggplot2 xgboost kaggle

我正在尝试Kaggle房价挑战:https://www.kaggle.com/c/house-prices-advanced-regression-techniques

这是我写的脚本

expect(list.first().getId()).toEqual(specific.id());
expect(list.first().getAttribute("outerHTML")).toEqual(specific.getAttribute("outerHTML"));

数据包含70多个功能,我使用train <- read.csv("train.csv") train$Id <- NULL previous_na_action = options('na.action') options(na.action = 'na.pass') sparse_matrix <- sparse.model.matrix(SalePrice~.-1,data = train) options(na.action = previous_na_action) model <- xgboost(data = sparse_matrix, label = train$SalePrice, missing = NA, max.depth = 6, eta = 0.3, nthread = 4, nrounds = 16, verbose = 2, objective = "reg:linear") importance <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = model) print(xgb.plot.importance(importance_matrix = importance)) xgboost = 6且max.depth = 16。

我得到的重要情节非常混乱,我如何只查看前5个特征或其他东西。

enter image description here

2 个答案:

答案 0 :(得分:3)

查看top_n的{​​{1}}参数。它完全符合您的要求。

xgb.plot.importance

编辑:仅限xgboost的开发版本。替代方法是这样做:

# Plot only top 5 most important variables.
print(xgb.plot.importance(importance_matrix = importance, top_n = 5))

答案 1 :(得分:0)

xgbImp1 <- xgb.importance(model = model)

这将确定模型的重要功能。

xgbImp1 <- xgbImp1 %>% mutate(rank = dense_rank(desc(Gain)))

这将为每个功能提供排名,因此我们可以将其更改为前5、10、15和20。

ggplot(data=xgbImp1[which(xgbImp1$rank <= 20),], aes(x = reorder(Feature, -Gain), y = Gain)) +
  geom_bar(stat="identity") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "XG Boosted Feature Importance (Top 20)", x = "Features", y = "Information Gain")