带有汇总r的逻辑值计数

时间:2019-02-16 01:44:50

标签: r dplyr grouping summarization

在数据框中,我有一列具有Y和N值。该数据框还具有一个id列。我想为每个id创建两列,一列的总Y计数,另一列的总N计数。我尝试使用dplyr汇总功能执行此过程

 group_by(id) %>%
 summarise(total_not = count(column_y_e_n == "N"),
           total_yes = count(column_y_e_n == "Y")

但反对错误消息

  

summarise_impl(.data,点)中的错误

有任何建议吗?

4 个答案:

答案 0 :(得分:0)

我会使用group_by和tally()解决问题。或者,您可以跳过中间步骤并直接使用计数。

library(tidyverse)

##Fake data
df <- tibble(
    id = rep(1:20,each = 10),
    column_y_e_n = sapply(1:200, function(i)sample(c("Y", "N"),1))
)

##group_by() + tally()
df_2 <- df %>%
    group_by(id, column_y_e_n) %>%
    tally() %>%
    spread(column_y_e_n, n) %>%
    magrittr::set_colnames(c("id", "total_not", "total_yes"))


df_2

#direct method
df_3 <- df %>%
    count(id, column_y_e_n) %>%
    spread(column_y_e_n, n) %>%
    magrittr::set_colnames(c("id", "total_not", "total_yes"))

df_3

最后一个管道散布结果列并格式化列名。

答案 1 :(得分:0)

Harro的原始答案略有不同:

library(tidyr)

dfr <- data.frame(
  id =  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
  bool = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)

dfrSummary <- dfr %>% 
  group_by(
    id, bool
    ) %>% 
  summarize(
    count = n()
    ) %>% 
  spread(
    key = bool, 
    value = count, 
    fill = 0
    )

答案 2 :(得分:0)

我用sum函数代替了count函数,并获得了成功。

 group_by(id) %>%
 summarise(total_not = sum(column_y_e_n == "N"),
           total_yes = sum(column_y_e_n == "Y")

答案 3 :(得分:0)

我通常想做整洁的一切。但是在这种情况下,基本的R解决方案似乎是合适的:

dfr <- data.frame(
  id =  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
  column_y_e_n = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)

table(dfr)

给您

   column_y_e_n
id  N Y
  1 1 4
  2 3 2
  3 3 0