Question

我想基于存在于多列中的列表中的任何值来创建新列（T / F）。对于此示例，我使用mtcars作为示例，在两列中搜索两个值，但是我的实际挑战是许多列中都有许多值。

我成功使用下面包含的filter_at()进行了过滤，但是我无法将该逻辑应用于突变：

# there are 7 cars with 6 cyl
mtcars %>%
  filter(cyl == 6)

# there are 2 cars with 19.2 mpg, one with 6 cyl, one with 8
mtcars %>% 
  filter(mpg == 19.2)

# there are 8 rows with either.
# these are the rows I want as TRUE
mtcars %>% 
  filter(mpg == 19.2 | cyl == 6)

# set the cols to look at
mtcars_cols <- mtcars %>% 
  select(matches('^(mp|cy)')) %>% names()

# set the values to look at
mtcars_numbs <- c(19.2, 6)

# result is 8 vars with either value in either col.
# this is a successful filter of the data
out1 <- mtcars %>% 
    filter_at(vars(mtcars_cols), any_vars(
        . %in% mtcars_numbs
        )
      )

# shows set with all 6 cyl, plus one 8cyl 21.9 mpg
out1 %>% 
  select(mpg, cyl)

# This attempts to apply the filter list to the cols,
# but I only get 6 rows as True
# I tried to change == to %in& but that results in an error
out2 <- mtcars %>%
    mutate(
      myset = rowSums(select(., mtcars_cols) == mtcars_numbs) > 0
    )

# only 6 rows returned
out2 %>% 
  filter(myset == T)

我不确定为什么跳过两行。我认为可能是rowSums的使用以某种方式聚合了这两行。

Answer 1

如果我们想进行相应的检查，最好使用map2

 library(dplyr)
 library(purrr)
 map2_df(mtcars_cols, mtcars_numbs, ~ 
       mtcars %>%
           filter(!! rlang::sym(.x) == .y)) %>%
     distinct

注意：使用浮点数进行比较（==可能会遇到麻烦，因为精度可能会发生变化并导致FALSE

此外，请注意，==仅在lhs和rhs元素具有相同长度或rhs向量为length 1时才有效（此处发生回收）。如果length大于1并且不等于lhs向量的长度，则循环将按列顺序进行比较。

我们可以rep进行合并以使长度相等，现在应该可以使用

mtcars %>%
 mutate(
   myset = rowSums(select(., mtcars_cols) == mtcars_numbs[col(select(., mtcars_cols))]) > 0
   ) %>% pull(myset) %>% sum
#[1] 8

在上面的代码select中使用了两次，以更好地理解。否则，我们也可以使用rep

mtcars %>%
 mutate(
   myset = rowSums(select(., mtcars_cols) == rep(mtcars_numbs, each = n())) > 0
    ) %>% 
   pull(myset) %>%
   sum
#[1] 8

R：从多个列中创建新的基于列的值列表

1 个答案: