某些情况下dplyr的summarize_all()

时间:2019-03-24 22:03:28

标签: r dplyr

我有一个数据框,其中包含一个ID列和多个要汇总的列。在每个列(互斥)中,我要计算与“ a”,“ b”或两者之一匹配的行。

IOException

据我所知:

IOException

我采用正确的方法吗?我正在尝试获取如下所示的内容:

> df
# A tibble: 5 x 3
     id col1  col2  col3
  <dbl> <chr> <chr> <chr>
1     1 NA    b     NA
2     2 NA    b     NA
3     3 NA    NA    a
4     4 b     NA    NA
5     5 a     NA    NA

2 个答案:

答案 0 :(得分:1)

您可以尝试:

library(tidyverse)

df %>%
  gather(key, value, -id) %>%
  group_by(key, value) %>%
  count %>%
  filter(!is.na(value))

# A tibble: 4 x 3
# Groups:   key, value [4]
  key   value     n
  <chr> <chr> <int>
1 col1  a         1
2 col1  b         1
3 col2  b         2
4 col3  a         1

如果您希望将表格结果编辑成问题,则可以执行以下操作:

df %>%
  gather(key, value, -id) %>%
  group_by(key, value) %>%
  count %>%
  filter(!is.na(value)) %>%
  group_by(key) %>%
  mutate(x = sum(n)) %>%
  spread(value, n, fill = 0)

# A tibble: 3 x 4
# Groups:   key [3]
  key       x     a     b
  <chr> <int> <dbl> <dbl>
1 col1      2     1     1
2 col2      2     0     2
3 col3      1     1     0

答案 1 :(得分:1)

一种tidyverse可能是:

 df %>%
  gather(var, letters, -id, na.rm = TRUE) %>%
  add_count(var, letters, name = "n_letters") %>%
  add_count(var, name = "n_all") %>%
  select(-id) %>%
  distinct()

  var   letters n_letters n_all
  <chr> <chr>       <int> <int>
1 col1  b               1     2
2 col1  a               1     2
3 col2  b               2     2
4 col3  a               1     1

或者:

df %>%
 gather(var, letters, -id, na.rm = TRUE) %>%
 add_count(var, letters, name = "n_letters") %>%
 add_count(var, name = "all") %>%
 select(-id) %>%
 distinct() %>%
 spread(letters, n_letters, fill = 0)

  var   all     a     b
  <chr> <int> <dbl> <dbl>
1 col1      2     1     1
2 col2      2     0     2
3 col3      1     1     0