Question

我试图了解pmap的工作原理。下面的小标题包含一个列表列function1 <- function (x, y, train, test){ a<- train[[x]] b<- train[[y]] c<- test[[x]] d<- test[[y]] return(list(a,b,c,d)) }。我想创建一个新列values，这取决于New列中的相应元素是否为NULL。由于未对is.null进行矢量化处理，因此我最初想到先使用values，然后再遇到rowwise()。

在pmap()之前使用rowwise()可以得到期望的结果，如下所示：

mutate()

但是，tbl = as.data.frame(do.call(rbind, pars)) %>% rowwise() %>% mutate(New = ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))) > tbl Source: local data frame [2 x 6] Groups: <by row> # A tibble: 2 x 6 id lower upper values default New <list> <list> <list> <list> <list> <chr> 1 <chr [1]> <dbl [1]> <dbl [1]> <NULL> <dbl [1]> a 5 2 <chr [1]> <NULL> <NULL> <list [3]> <chr [1]> b 0, b 1, b 2不会：

pmap()

如果我使用匿名函数代替代字号，它似乎可以正常工作：

tbl = as.data.frame(do.call(rbind, pars)) %>%
      mutate(New = pmap(., ~ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default                         New
1  a     1    10    NULL       5 a NULL, b list("0", "1", "2")
2  b  NULL  NULL 0, 1, 2       1 a NULL, b list("0", "1", "2")

但是我不明白为什么波浪号版本会失败？我宁愿不必完全指定参数，因为我需要在多个列上映射函数。我要去哪里错了？

Answer 1

我正要问一个非常类似的问题。基本上，询问如何在pmap中使用mutate，而不必多次使用变量名。相反，我将其作为“答案”发布在这里，因为它包含一个reprex和许多我发现的选项，这些选项都不令我完全满意。希望其他人能够按照要求回答如何做。

在使用带有列表列的data.frame时，我经常想在purrr::pmap内使用dplyr::mutate。有时，这涉及到很多重复的变量名。我希望能够使用匿名函数更简洁地执行此操作，以便在传递给pmap的{{1}}参数时，变量仅使用一次。

以这个小型数据集为例：

.f

说我要应用于每一行的函数是

library('dplyr')
library('purrr')

df <- tribble(
  ~x,   ~y,      ~z,         
  c(1), c(1,10), c(1, 10, 100),
  c(2), c(2,20), c(2, 20, 200),
)

在实践中，该函数将更加复杂，并包含许多变量。该函数只需要使用一次，因此我不希望不必显式命名它并阻塞脚本和工作环境。

以下是选项。每个创建完全相同的data.frame，但以不同的方式。包含func <- function(x, y, z){c(sum(x), sum(y), sum(z))} .. 1 avg`` will be come clear. Note I'm not considering position matching using .. 2`等的原因很容易弄乱。

据我所知，这些是选项，不包括位置匹配。

理想情况下，可能会发生以下类似情况，其中函数# Explicitly create a function for `.f`. # This requires using the variable names (x, y, z) three times. # It's completely clear what it's doing, but needs a lot of typing. # It might sometimes fail - see https://github.com/tidyverse/purrr/issues/280 df_explicit <- df %>% mutate( avg = x - mean(x), a = pmap(.l = list(x, y, z), .f = function(x, y, z){ c(sum(x), sum(y), sum(z)) }) ) # Pass the whole of `df` to `.l` and add `...` in an explicit function to deal with any unused columns. # variable names are used twice. # `df` will have to be passes explicitly if not using pipes (eg, `mutate(.data = df, a = pmap(.l = df, ...`). # This is probably inefficient for large datasets. df_dots <- df %>% mutate( avg = x - mean(x), a = pmap(.l = ., .f = function(x, y, z, ...){ c(sum(x), sum(y), sum(z)) }) ) # Use `pryr::f` (as discussed in https://stackoverflow.com/a/51123520/4269699). # Variable names are used twice. # Potentially unexpected behaviour. # Not obvious to the casual reader why the extra `pryr::f` is needed and what it's doing df_pryrf <- df %>% mutate( avg = x - mean(x), a = pmap(.l = list(x,y,z), .f = pryr::f({c(sum(x), sum(y), sum(z))} )) ) # Use `rowwise()` similar to this: https://stackoverflow.com/a/47734073/4269699 # Variable names are used once. # It will mess up any vectorised functions used elsewhere in mutate, hence the two `mutate()`s df_rowwise <- df %>% mutate( avg = x - mean(x) ) %>% rowwise() %>% mutate( a = list( {c(sum(x), sum(y), sum(z))} ) ) %>% ungroup() # Use Romain Francois' neat {rap} package. # Variable names used once. # Like `rowwise()` it will mess up any vectorised functions so it needs two `mutate()`s for this particular problem # library('rap') #devtools::install_github("romainfrancois/rap") df_rap <- df %>% mutate( avg = x - mean(x) ) %>% rap( a = ~ c(sum(x), sum(y), sum(z)) ) # Another solution discussed here https://stackoverflow.com/a/51123520/4269699 doesn't seem to work inside `mutate()`, but maybe could be tweaked? # Like the `pryr::f` solution, it's not immediately obvious what the purpose of the `with(list(...` bit is. df_with <- df %>% mutate( avg = x-mean(x), a = pmap(.l = list(x,y,z), .f = ~with(list(...), { c(sum(x), sum(y), sum(z))} )) )知道从传递的对象中查找（行式）变量qmap，x和y z的{{1}}参数。

mutate

但是我不知道该怎么做，所以只考虑部分答案。

R-使用purrr :: pmap（）进行逐行迭代

1 个答案: