以变​​量向量为条件进行突变

时间:2018-07-23 10:22:03

标签: r dplyr tidyverse

我想创建一个变量,如果在多个列中的任何一个中找到特定值,则取值为1,否则为0。可以使用ifelse完成此操作,但是列数超过...大约3左右会变得很繁琐。

我可能可以编写一个自定义函数来执行此操作,但是我很好奇tidyverse中是否存在优雅的解决方案。

示例代码:

library(tidyverse)

example_tib <- tibble(
  var0 = 1:4, 
  var1 = c('a', 'h', 'o', 'v'),
  var2 = c('b', 'i', 'p', 'w'),
  var3 = c('c', 'j', 'q', 'x'),
  var4 = c('d', 'k', 'r', 'y'),
  var5 = c('e', 'l', 's', 'z'),
  var6 = c('f', 'm', 't', 'a'),
  var7 = c('g', 'n', 'u', 'b'),
  var8 = 5:8
)

variables_interest <- sprintf("%s%d", "var", 2:7)

# This doesn't work but 
# shows what I want to do

example_tib %>%
  mutate(pass = ifelse(any(variables_interest) <= 'o' & 
                       any(variables_interest) > 'g', 1, 0))

所需的输出:

# A tibble: 4 x 10
   var0 var1  var2  var3  var4  var5  var6  var7   var8  pass
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
1     1 a     b     c     d     e     f     g         5     0
2     2 h     i     j     k     l     m     n         6     1
3     3 o     p     q     r     s     t     u         7     0
4     4 v     w     x     y     z     a     b         8     0

3 个答案:

答案 0 :(得分:3)

不确定它是否优雅,并假设您在第3行的值是错误的:

w <- quo(variables_interest)
example_tib %>% bind_cols(
     example_tib %>% mutate(id=row_number()) %>%
     gather(k,v,UQ(w)) %>%
     group_by(id) %>%
     summarise(pass=as.integer(sum((v>"g")&(v<="o"))>0)) %>%
     select(-id))

## A tibble: 4 x 10
#   var0 var1  var2  var3  var4  var5  var6  var7   var8  pass
#  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int>
#1     1 a     b     c     d     e     f     g         5     0
#2     2 h     i     j     k     l     m     n         6     1
#3     3 o     p     q     r     s     t     u         7     0
#4     4 v     w     x     y     z     a     b         8     0

答案 1 :(得分:2)

如果您愿意接受R的基础,则可以非常简单地完成:

x <- example_tib[variables_interest]
example_tib$pass <- as.numeric(rowSums(x <= "o" & x > "g")>0)
example_tib
# # A tibble: 4 x 10
#    var0  var1  var2  var3  var4  var5  var6  var7  var8  pass
#   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
# 1     1     a     b     c     d     e     f     g     5     0
# 2     2     h     i     j     k     l     m     n     6     1
# 3     3     o     p     q     r     s     t     u     7     0
# 4     4     v     w     x     y     z     a     b     8     0

答案 2 :(得分:0)

也许有一种更优雅的方法来做到这一点,但这是一种方法:

library(tidyverse)
library(reshape2)
library(magrittr)

example_tib %>% 
  melt('var0') %>% 
  group_by(var0) %>% 
  mutate(pass=variable %>% 
           is_in(variables_interest) %>% 
           and(value <= 'o' & value > 'g') %>% 
           max %>% 
           is_greater_than(0) %>% 
           ifelse(1,0)) %>% 
  dcast(var0+pass~variable)

# var0 pass var1 var2 var3 var4 var5 var6 var7 var8
# 1    0    a    b    c    d    e    f    g    5
# 2    1    h    i    j    k    l    m    n    6
# 3    0    o    p    q    r    s    t    u    7
# 4    0    v    w    x    y    z    a    b    8