我想基于存在于多列中的列表中的任何值来创建新列(T / F)。对于此示例,我使用mtcars作为示例,在两列中搜索两个值,但是我的实际挑战是许多列中都有许多值。
我成功使用下面包含的filter_at()
进行了过滤,但是我无法将该逻辑应用于突变:
# there are 7 cars with 6 cyl
mtcars %>%
filter(cyl == 6)
# there are 2 cars with 19.2 mpg, one with 6 cyl, one with 8
mtcars %>%
filter(mpg == 19.2)
# there are 8 rows with either.
# these are the rows I want as TRUE
mtcars %>%
filter(mpg == 19.2 | cyl == 6)
# set the cols to look at
mtcars_cols <- mtcars %>%
select(matches('^(mp|cy)')) %>% names()
# set the values to look at
mtcars_numbs <- c(19.2, 6)
# result is 8 vars with either value in either col.
# this is a successful filter of the data
out1 <- mtcars %>%
filter_at(vars(mtcars_cols), any_vars(
. %in% mtcars_numbs
)
)
# shows set with all 6 cyl, plus one 8cyl 21.9 mpg
out1 %>%
select(mpg, cyl)
# This attempts to apply the filter list to the cols,
# but I only get 6 rows as True
# I tried to change == to %in& but that results in an error
out2 <- mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == mtcars_numbs) > 0
)
# only 6 rows returned
out2 %>%
filter(myset == T)
我不确定为什么跳过两行。我认为可能是rowSums
的使用以某种方式聚合了这两行。
答案 0 :(得分:1)
如果我们想进行相应的检查,最好使用map2
library(dplyr)
library(purrr)
map2_df(mtcars_cols, mtcars_numbs, ~
mtcars %>%
filter(!! rlang::sym(.x) == .y)) %>%
distinct
注意:使用浮点数进行比较(==
可能会遇到麻烦,因为精度可能会发生变化并导致FALSE
此外,请注意,==
仅在lhs
和rhs
元素具有相同长度或rhs
向量为length
1时才有效(此处发生回收)。如果length
大于1并且不等于lhs向量的长度,则循环将按列顺序进行比较。
我们可以rep
进行合并以使长度相等,现在应该可以使用
mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == mtcars_numbs[col(select(., mtcars_cols))]) > 0
) %>% pull(myset) %>% sum
#[1] 8
在上面的代码select
中使用了两次,以更好地理解。否则,我们也可以使用rep
mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == rep(mtcars_numbs, each = n())) > 0
) %>%
pull(myset) %>%
sum
#[1] 8