使用dplyr计算数据帧中多列中的“是”

时间:2019-11-30 13:35:51

标签: r dplyr

假设我有以下数据。 [根据要求,我正在添加数据]

col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High",   "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
# Col1      Col2    Col3    Col4
# Team A    High    Yes     No
# Team A    Medium  Yes     Yes
# Team A    Medium  No      No
# Team B    Low     No      Yes
# Team B    Low     No      Yes
# Team B    Low     Yes     No
# Team C    High    No      No
# Team C    Medium  Yes    Yes
# Team C    Low     No     No 
# Team D    Medium  Yes    Yes
# Team D    Medium  Yes    No
# Team D    Medium  Yes    Yes

我想使用dplyr函数来获得以下结果。 Status_1将是Col3对每个团队的“是”的计数,而Status_2将是Col4对每个团队的“是”的计数

       High Medium  Low Status_1    Status_2
Team A    1      2    0        2           1
Team B    0      0    3        1           2
Team C    1      1    1        1           1
Team D    0      3    0        3           2

我能够使用以下语句为“ Status_1”和“ Status_2”的后两列生成常规摘要。有人可以帮忙吗?

df %>%
  group_by(Col1, Col2) %>%
  summarise(Count = n()) %>%
  spread(Col1, Count, fill = 0)

2 个答案:

答案 0 :(得分:2)

我将使用1grepl来简单地计算匹配数:

sum

reprex package(v0.3.0)于2019-11-30创建

您也可以使用df %>% mutate_if(is.factor, as.character) %>% # your example data was sotred as factor group_by(col1) %>% summarise(High = sum(grepl("High", col2)), Medium = sum(grepl("Medium", col2)), Low = sum(grepl("Low", col2)), Status_1 = sum(grepl("Yes", col3)), Status_2 = sum(grepl("Yes", col4))) #> # A tibble: 4 x 6 #> col1 High Medium Low Status_1 Status_2 #> <chr> <int> <int> <int> <int> <int> #> 1 Team A 1 2 0 2 1 #> 2 Team B 0 0 3 1 2 #> 3 Team C 1 1 1 1 1 #> 4 Team D 0 3 0 3 2 中的greplstr_count代替str_detect。在这种情况下,所有人都在做相同的事情。重要的是使用stringr,以便将计数增加到一个值。

答案 1 :(得分:2)

首先,将数据按col1分组以计算Yescol3col4的数量。然后,再次按所有列分组,并使用n()计算每组中的观察值。最后,使用tidyr::pivot_wider将数据从长到宽转换。

df %>%
  group_by(col1) %>%
  mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
  rename(status_1 = col3, status_2 = col4) %>% 
  group_by_all %>%
  summarise(n = n()) %>%
  tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))

# # A tibble: 4 x 6
#   col1   status_1 status_2  High Medium   Low
#   <fct>     <int>    <int> <int>  <int> <int>
# 1 Team A        2        1     1      2     0
# 2 Team B        1        2     0      0     3
# 3 Team C        1        1     1      1     1
# 4 Team D        3        2     0      3     0