汇总并找到匹配项

时间:2019-09-14 18:26:51

标签: r dplyr

如何对A列进行分组后如何使用dplyr,以便如果其中一个,例如:“ a1”包含A22 or R22,则必须为所有“ a1”说Yes,否则No新列C。

请帮助我实现这一目标。

数据框:

df1 <- data.frame(A= c("a1","a1","a1","b1","b1","b1","c1","c1"),
                  B= c("A22","B2","C2","R22","G2","C2","G2","O2"))
A           B            
a1         A22
a1         B2
a1         C2
b1         R22
b1         G2
b1         C2
c1         G2
c1         O2

预期结果:

A           B       C
a1         A22      Yes
a1         B2       Yes
a1         C2       Yes 
b1         R22      Yes
b1         G2       Yes
b1         C2       Yes
c1         G2       No
c1         O2       No

3 个答案:

答案 0 :(得分:2)

我们可以按'A'分组,然后如果字符串'A22','R22'中的any存在in%'B'以在其中创建'Yes','No'值'C'

library(dplyr)
df1 %>%
   group_by(A) %>%
   mutate(C = c("No", "Yes")[1+ (any(c("A22", "R22") %in% B))])
# A tibble: 8 x 3
# Groups:   A [3]
#  A     B     C    
#  <fct> <fct> <chr>
#1 a1    A22   Yes  
#2 a1    B2    Yes  
#3 a1    C2    Yes  
#4 b1    R22   Yes  
#5 b1    G2    Yes  
#6 b1    C2    Yes  
#7 c1    G2    No   
#8 c1    O2    No   

或者我们可以使用base R

with(df1, ave(as.character(B), A, FUN = function(x) 
       c("No", "Yes")[1 + any(c("A22", "R22") %in% x)]))
#[1] "Yes" "Yes" "Yes" "Yes" "Yes" "Yes" "No"  "No" 

答案 1 :(得分:2)

df1 %>%
    group_by(A) %>%
    mutate(C = factor(max(B %in% c("A22", "R22")),
                      levels = 0:1,
                      labels = c("No", "Yes"))) %>%
    ungroup()
#> # A tibble: 8 x 3
#>   A     B     C    
#>   <fct> <fct> <fct>
#> 1 a1    A22   Yes  
#> 2 a1    B2    Yes  
#> 3 a1    C2    Yes  
#> 4 b1    R22   Yes  
#> 5 b1    G2    Yes  
#> 6 b1    C2    Yes  
#> 7 c1    G2    No   
#> 8 c1    O2    No

答案 2 :(得分:0)

这是data.table选项

library(data.table)
library(dplyr)

dt <- data.table(
  A= c("a1","a1","a1","b1","b1","b1","c1","c1"),
 B= c("A22","B2","C2","R22","G2","C2","G2","O2"))

dt[, 
    flag := case_when(
      B %in% c("A22", "R22") ~ "Yes",
      TRUE ~ "No"
    )
]
dt[, flag := factor(flag, levels = c("No", "Yes"), ordered = TRUE)]

dt[, flag := max(flag), by = A]