在以下代码中,来自purrr的map_dfr起作用,但是来自furrr的future_map_dfr引发错误。我该如何解决?
#install.packages("randomForest"); install.packages("tidyverse"); install.packages("iml")
library(tidyverse); library(iml); library(randomForest)
library(furrr)
plan(multiprocess)
set.seed(42)
mtcars1 <- mtcars %>% mutate(vs = as.factor(vs),
id = row_number())
x <- "vs"
y <- paste0(setdiff(setdiff(names(mtcars1), "vs"), "id"), collapse = "+")
rf = randomForest(as.formula(paste0(x, "~ ", y)), data = mtcars1, ntree = 50)
predictor <- Predictor$new(rf, data = mtcars1, y = mtcars1$vs)
# Results using map_dfr() from purrr
shapelyresults <- map_dfr(1:nrow(mtcars), ~(Shapley$new(predictor, x.interest = mtcars1[.x,]) %>%
.$results %>%
as_tibble() %>%
arrange(desc(phi)) %>%
slice(1:5) %>%
select(feature.value, phi) %>%
mutate(id = .x)))
# Attempt to use future_map_dfr() from furrr
f_shapelyresults <- future_map_dfr(1:nrow(mtcars), ~(Shapley$new(predictor, x.interest = mtcars1[.x,]) %>%
.$results %>%
as_tibble() %>%
arrange(desc(phi)) %>%
slice(1:5) %>%
select(feature.value, phi) %>%
mutate(id = .x)))
答案 0 :(得分:1)
furrr
的 future
可以使用映射到不同CPU内核或线程的R子进程及其各自的环境/作用域。
根据我的经验,通常会出现两种类型的问题:
因此,您可能会:
-将purrr
lambda函数重写为命名函数,并在函数顶部抛出require()
调用以排除第一种类型的问题。
-在命名函数中,还将辅助数据作为参数传递。
尝试这样的事情:
library(furrr)
my_function <-
function(primary_object, Shapely_object) {
require(tidyverse); require(iml); require(randomForest)
Shapley_object$new(predictor,
x.interest = mtcars1[primary_object, ]) %>%
.$results %>%
as_tibble() %>%
arrange(desc(phi)) %>%
slice(1:5) %>%
select(feature.value, phi) %>%
mutate(id = primary_object))
}
f_shapelyresults <-
future_map_dfr(
.x = 1:nrow(mtcars), # 1st argument: primary_object, above
.f = my_function,
Shapely_object = Shapely # 2nd argument, as seen above
)