dplyr:使用select_if()进行条件列选择

时间:2019-01-11 19:37:42

标签: r dplyr

上一个问题的跟进...

如何基于类型选择所有列,除了基于选择助手功能的一列?

select_if(iris, is.numeric, vars(-contains("Width")))
Error: No tidyselect variables were registered

我将其放在嵌套的数据框中,并在purrr :: map()上运行它,这会使工作流程选项稍微复杂化:

iris %>% 
  group_by(Species) %>% 
  nest %>% 
  mutate(data = map(data, ~ .x %>% select_if(is.numeric) %>% mutate(count = sum(rowSums(.))))) %>%
  mutate(data = map(data, ~ .x %>% select_if(is.numeric) %>% 
                      mutate_all(funs((. / count) * 100 )))) %>%

  unnest 

3 个答案:

答案 0 :(得分:1)

您可以这样做:

select_if(iris %>% select_at(vars(-contains("Width"))), is.numeric)

    Sepal.Length Petal.Length
1            5.1          1.4
2            4.9          1.4
3            4.7          1.3
4            4.6          1.5
5            5.0          1.4

回答更新的问题:

df1 <- iris %>% 
 group_by(Species) %>% 
 nest() %>% 
 mutate(data = map(data, function(x) select_if(x %>% select_at(vars(-contains("Width"))), is.numeric) %>% mutate(count = sum(rowSums(.))))) %>%
 mutate(data = map(data, function(x) select_if(x %>% select_at(vars(-contains("Width"))), is.numeric) %>% mutate_all(funs((. / count) * 100 )))) %>%
 unnest() 

df2 <- iris %>% 
 group_by(Species) %>% 
 nest() %>% 
 mutate(data = map(data, ~ .x %>% select_if(is.numeric) %>% select_at(vars(-contains("Width"))) %>% mutate(count = sum(rowSums(.))))) %>%
 mutate(data = map(data, ~ .x %>% select_if(is.numeric) %>% select_at(vars(-contains("Width"))) %>% mutate_all(funs((. / count) * 100 )))) %>%
 unnest() 

identical(df1, df2)
[1] TRUE

正如在df1的代码中看到的那样,您仍然可以执行嵌套的select()并分别返回与两个select()命令相同的结果。

答案 1 :(得分:1)

最简单,最清晰的方法是将2个select函数通过管道连接在一起:

iris %>%
    select_if(is.numeric) %>%       # Select all numeric columns
    select(-contains('Width')) %>%  # Then drop 'Width' column(s)
    head

  Sepal.Length Petal.Length
1          5.1          1.4
2          4.9          1.4
3          4.7          1.3
4          4.6          1.5
5          5.0          1.4
6          5.4          1.7

这甚至在map函数中也有效:

iris %>% 
    group_by(Species) %>% 
    nest %>% 
    mutate(data = map(data, ~ .x %>%
                          select_if(is.numeric) %>%
                          select(-contains('Width')) %>%
                          mutate(count = sum(rowSums(.))))) %>%
    mutate(data = map(data, ~ .x %>%
                          select_if(is.numeric) %>%
                          select(-contains('Width')) %>% 
                          mutate_all(funs((. / count) * 100 )))) %>%

    unnest 

# A tibble: 150 x 4
   Species Sepal.Length Petal.Length count
   <fct>          <dbl>        <dbl> <dbl>
 1 setosa          1.58        0.433   100
 2 setosa          1.52        0.433   100
 3 setosa          1.45        0.402   100
 4 setosa          1.42        0.464   100
 5 setosa          1.55        0.433   100
 6 setosa          1.67        0.526   100
 7 setosa          1.42        0.433   100
 8 setosa          1.55        0.464   100
 9 setosa          1.36        0.433   100
10 setosa          1.52        0.464   100
# ... with 140 more rows

答案 2 :(得分:0)

select_if(iris[, !colnames(iris) %in% ("Sepal.Width")], is.numeric)
    Sepal.Length Petal.Length Petal.Width
1            5.1          1.4         0.2
2            4.9          1.4         0.2
3            4.7          1.3         0.2
4            4.6          1.5         0.2
5            5.0          1.4         0.2