Question

我在R中使用lime。Lime是一种从复杂模型构建局部线性模型的方法。 R包向我返回了线性模型的系数和线性模型的预测。我正在尝试使用系数重构预测，以更好地了解石灰的工作原理，但重构不会产生相同的结果。

我的代码是

library(MASS)
library(lime)
library(caret)
library(dplyr)
data(biopsy)

# First we'll clean up the data a bit
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
names(biopsy) <- c('clump thickness', 'uniformity of cell size', 
                   'uniformity of cell shape', 'marginal adhesion',
                   'single epithelial cell size', 'bare nuclei', 
                   'bland chromatin', 'normal nucleoli', 'mitoses',
                   'class')

set.seed(4)
test_set <- sample(seq_len(nrow(biopsy)), 4)
data_train = biopsy[-test_set,] %>% dplyr::select(-class)
class_train = biopsy[-test_set,] %>% .[["class"]] %>% factor
data_test = biopsy[test_set,] %>% dplyr::select(-class)
class_test = biopsy[test_set,] %>% .[["class"]] %>% factor
model = train(data_train, class_train, method="rf") # Random Forest

explainer <- lime(data_train, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(data_test[1,], explainer, n_labels = 1, n_features = 4)

对于测试数据data_test[1,]，我想检查局部线性模型（受岭回归训练）。我将其存储在model_expl中。特别有趣的是拦截。有关特征（如系数）的模型信息存储在feature_expl中。

model_expl = explanation %>%
  dplyr::select(-starts_with("feature")) %>%
  filter(case == .$case[1]) %>%
  unique %>%
  mutate_if(is.numeric, as.character) %>%
  mutate_all(as.character) %>%
  gather(key, value)

feature_expl = explanation %>%
  dplyr::select(case, starts_with("feature")) %>%
  filter(case == .$case[1])

打印结果

1 model_type      classification                                                                              
2 case            416                                                                                         
3 label           benign                                                                                      
4 label_prob      0.552                                                                                       
5 model_r2        0.475778176360649                                                                           
6 model_intercept 0.104316310033944                                                                           
7 model_predicti… 0.715122989626457                                                                           
8 data            list(`clump thickness` = 3, `uniformity of cell size` = 3, `uniformity of cell shape` = 2, …
9 prediction      list(benign = 0.552, malignant = 0.448)


  case                  feature feature_value feature_weight                     feature_desc
1  416                  mitoses             1      0.0253919                  mitoses <= 3.25
2  416              bare nuclei             3      0.2476868              bare nuclei <= 3.25
3  416  uniformity of cell size             3      0.1792691  uniformity of cell size <= 3.25
4  416 uniformity of cell shape             2      0.1584589 uniformity of cell shape <= 3.25

从解释中我得到model_prediction中的0.715906270331288。使用0.114219195393416的截距，我尝试重建局部近似值：

sum(feature_expl$feature_value * feature_expl$feature_weight) + 0.114219195393416

但获取2.155599而不是0.715906270331288。我读到我需要缩放，但是我找不到正确执行缩放的方法。我需要做什么来重建局部预测？

在LIME中重建局部预测

0 个答案: