我在R中使用lime。Lime是一种从复杂模型构建局部线性模型的方法。 R包向我返回了线性模型的系数和线性模型的预测。我正在尝试使用系数重构预测,以更好地了解石灰的工作原理,但重构不会产生相同的结果。
我的代码是
library(MASS)
library(lime)
library(caret)
library(dplyr)
data(biopsy)
# First we'll clean up the data a bit
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
names(biopsy) <- c('clump thickness', 'uniformity of cell size',
'uniformity of cell shape', 'marginal adhesion',
'single epithelial cell size', 'bare nuclei',
'bland chromatin', 'normal nucleoli', 'mitoses',
'class')
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy)), 4)
data_train = biopsy[-test_set,] %>% dplyr::select(-class)
class_train = biopsy[-test_set,] %>% .[["class"]] %>% factor
data_test = biopsy[test_set,] %>% dplyr::select(-class)
class_test = biopsy[test_set,] %>% .[["class"]] %>% factor
model = train(data_train, class_train, method="rf") # Random Forest
explainer <- lime(data_train, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(data_test[1,], explainer, n_labels = 1, n_features = 4)
对于测试数据data_test[1,]
,我想检查局部线性模型(受岭回归训练)。我将其存储在model_expl
中。特别有趣的是拦截。有关特征(如系数)的模型信息存储在feature_expl
中。
model_expl = explanation %>%
dplyr::select(-starts_with("feature")) %>%
filter(case == .$case[1]) %>%
unique %>%
mutate_if(is.numeric, as.character) %>%
mutate_all(as.character) %>%
gather(key, value)
feature_expl = explanation %>%
dplyr::select(case, starts_with("feature")) %>%
filter(case == .$case[1])
打印结果
1 model_type classification
2 case 416
3 label benign
4 label_prob 0.552
5 model_r2 0.475778176360649
6 model_intercept 0.104316310033944
7 model_predicti… 0.715122989626457
8 data list(`clump thickness` = 3, `uniformity of cell size` = 3, `uniformity of cell shape` = 2, …
9 prediction list(benign = 0.552, malignant = 0.448)
case feature feature_value feature_weight feature_desc
1 416 mitoses 1 0.0253919 mitoses <= 3.25
2 416 bare nuclei 3 0.2476868 bare nuclei <= 3.25
3 416 uniformity of cell size 3 0.1792691 uniformity of cell size <= 3.25
4 416 uniformity of cell shape 2 0.1584589 uniformity of cell shape <= 3.25
从解释中我得到model_prediction
中的0.715906270331288
。
使用0.114219195393416
的截距,我尝试重建局部近似值:
sum(feature_expl$feature_value * feature_expl$feature_weight) + 0.114219195393416
但获取2.155599
而不是0.715906270331288
。我读到我需要缩放,但是我找不到正确执行缩放的方法。我需要做什么来重建局部预测?