在函数内的dplyr中的列中存储和调用变量

时间:2019-01-29 09:39:15

标签: r dplyr rlang nse

我想在小标题的列单元格中存储一些变量。然后,我想调用该列并粘贴这些变量的名称,或者调用该列并将这些变量对应的列粘贴在一起。另外,所有这些都发生在一个函数中,这是剩下的唯一硬编码部分,所以我真的很想找到一种解决方法。

library("tidyverse") 
myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))

myData%>%mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
  c1    c2    c3    factors   factors1
  <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> c1:c2:c1
2 b     2     B     <chr [2]> c2:c3:c2
3 c     3     C     <chr [3]> c1:c2:c3

因此,这允许我串联变量的名称,但是如您所见,如果一个列表比其他列表长,则会循环。

对于第二个我想使用$ factors列专门调用其他列的值的问题,我可以这样进行硬编码:

myData%>%
mutate(factors2=interaction(!!!syms(c("c1","c2")),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
 c1    c2    c3    factors   factors2
 <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> a:1     
2 b     2     B     <chr [2]> b:2     
3 c     3     C     <chr [3]> c:3  

但是,如果我尝试这样做:

myData%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

Error in lapply(.x, .f, ...) : object 'factors' not found

如果我尝试取消列出因素或使用其他Rlang表达式,也会发生同样的情况。我也尝试过嵌套rlang表达式,但到目前为止还没有找到符合我期望的表达式。

我觉得这应该是可行的,但是到目前为止,我还没有找到有关堆栈溢出的问题,也没有找到表明它可能使我陷入困境的教程。谢谢大家的时间和帮助。

我的完整代码:

library("tidyverse") 

myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))%>%
mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

我想要的输出是:

    # A tibble: 3 x 6
 c1    c2    c3    factors   factors1   factors2
 <chr> <chr> <chr> <list>     <fct>      <fct>   
1 a     1     A     <chr [2]> c1:c2       a:1     
2 b     2     B     <chr [2]> c2:c3       2:B     
3 c     3     C     <chr [3]> c1:c2:c3    c:3:C  

2 个答案:

答案 0 :(得分:1)

您的第一个问题可以使用purrr::mappurrr::lift函数族来解决:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate_at( "factors1", lift(fct_c) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   factors1
#   <chr> <chr> <chr> <list>    <fct>
# 1 a     1     A     <chr [2]> c1:c2
# 2 b     2     B     <chr [2]> c2:c3
# 3 c     3     C     <chr [3]> c1:c2:c3

第二个问题比较棘手,因为!!!立即导致对其参数的求值,这有时会导致dplyr链内的操作符优先级不直观。最干净的方法是定义一个独立函数,该函数组成您的interaction表达式:

f <- function(fct) {expr( interaction(!!!syms(fct), sep=":", lex.order=TRUE) )}

# Example usage
f( myData$factors[[1]] )    # interaction(c1, c2, sep = ":", lex.order = TRUE)
f( myData$factors[[2]] )    # interaction(c2, c3, sep = ":", lex.order = TRUE)

myData %>% mutate( e = map(factors, f) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   e
#   <chr> <chr> <chr> <list>    <list>
# 1 a     1     A     <chr [2]> <language>
# 2 b     2     B     <chr [2]> <language>
# 3 c     3     C     <chr [3]> <language>

不幸的是,我们无法直接求值e,因为它将把整个列c1c2c3馈入表达式,而您只想与表达式位于同一行的单个值。因此,我们需要以行方式封装列c1c3

X <- myData %>% mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup()
# # A tibble: 3 x 6
#   c1    c2    c3    factors   e          d
#   <chr> <chr> <chr> <list>    <list>     <list>
# 1 a     1     A     <chr [2]> <language> <tibble [1 × 3]>
# 2 b     2     B     <chr [2]> <language> <tibble [1 × 3]>
# 3 c     3     C     <chr [3]> <language> <tibble [1 × 3]>

现在您在e中有表达式需要应用于d中的数据,因此从这里开始只是一个简单的map2遍历。将所有内容放在一起进行清理,我们得到:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup() %>%
  mutate( factors2 = map2( e, d, rlang::eval_tidy ) ) %>%
  mutate_at( vars(factors1,factors2), lift(fct_c) ) %>%
  select( -e, -d )
# # A tibble: 3 x 6
#   c1    c2    c3    factors   factors1 factors2
#   <chr> <chr> <chr> <list>    <fct>    <fct>
# 1 a     1     A     <chr [2]> c1:c2    a:1
# 2 b     2     B     <chr [2]> c2:c3    2:B
# 3 c     3     C     <chr [3]> c1:c2:c3 c:3:C

答案 1 :(得分:1)

这是使用mapimap的方法:

library(tidyverse)

myData %>%
  mutate(factor1 = factors %>% map(~interaction(as.list(.), sep=':', lex.order = TRUE)) %>% unlist(),
         factor2 = factors %>% imap(~interaction(myData[.y, match(.x, names(myData))], sep=":", lex.order = TRUE)) %>% unlist())

对于factor1,我将列表传递到interaction中,而不是将参数拼接成点。

对于factor2,我将每一行中的factorsnames中的myData进行匹配,并将列索引(match(.x, names(myData)))与行结合使用索引(来自.y的{​​imap),以子集相应元素以填充到interaction中。

factor1factor2都需要unlist,因为mapimap返回列表。

输出:

# A tibble: 3 x 6
  c1    c2    c3    factors   factor1  factor2
  <chr> <chr> <chr> <list>    <fct>    <fct>  
1 a     1     A     <chr [2]> c1:c2    a:1    
2 b     2     B     <chr [2]> c2:c3    2:B    
3 c     3     C     <chr [3]> c1:c2:c3 c:3:C