r仅在两个组之一和两个组中查找成员

时间:2019-11-15 18:32:14

标签: r dplyr

如果这是我的数据

Number        Group  Length    
4432          1      NA        
4432          2      2.34      
4564          1      5.89      
4389          1      NA        
6578          2      3.12       
4389          2      NA            
4355          1      4.11      
4355          2      6.15       
4689          1      6.22      
4689          1      NA        

我试图找到仅在组1或组2中的船Numbers和在组1和组2中的船Numbers

Number        Group  Length    Results
4432          1      NA        Both 1 &2
4432          2      2.34      Both 1 &2
4564          1      5.89      1
4389          1      NA        1
6578          2      3.12      2 
4389          2      NA        2    
4355          1      4.11      Both 1 & 2
4355          2      6.15      Both 1 & 2 
4689          1      6.22      1
4689          1      NA        1

我可以使用for循环和子集来完成此操作,我对dplyr或其他创建Results列的方法很感兴趣。任何帮助表示赞赏。谢谢。

2 个答案:

答案 0 :(得分:5)

我们可以使用width检查唯一的“组”的数量,并粘贴带有前缀“两个”的height“组”

GeometryReader

如果不需要“两者”

n_distinct

数据

unique

答案 1 :(得分:1)

Base R解决方案:

# Row-wise concatenate the Group vector by the number separating it with an " & "

aggregated_df <- aggregate(list(Results = df$Group), list(Number = df$Number), paste0, collapse = " & ")

# Preserve unique elements (removing the ampersand if elements are duplicated): 

aggregated_df$Results <- sapply(strsplit(aggregated_df$Results, " & "),

                               function(x){paste0(unique(x), collapse = " & ")})

# If the string contains an ampersand concatenate both infront of the grouping string: 

aggregated_df$Group <- ifelse(grepl(" & ", aggregated_df$Results), paste0("Both ", aggregated_df$Results),
                              aggregated_df$Results)

# Merge the two dataframes together: 

df <- merge(df, aggregated_df, by = "Number", all.x = T, sort = F)

Base R解决方案2(拆分,应用,组合):

# Split dataframe by number, apply group concatenation function, combine as data.frame:

df2 <- data.frame(do.call("rbind", lapply(split(df, df$Number), function(x){

        res <- paste0(unique(x$Group), collapse = " & ")

        x$Result <- ifelse(grepl(" & ", res), paste0("Both ", res), res)

        x

      }

    )

  ),

 row.names = NULL

)

# Reorder the new dataframe using the old df order: 

df2 <- df2[order(df$Number),]

数据:

df <- structure(
  list(
    Number = c(
      4432L,
      4432L,
      4564L,
      4389L,
      6578L,
      4389L,
      4355L,
      4355L,
      4689L,
      4689L
    ),
    Group = c(1L, 2L, 1L, 1L,
              2L, 2L, 1L, 2L, 1L, 1L),
    Length = c(NA, 2.34, 5.89, NA, 3.12,
               NA, 4.11, 6.15, 6.22, NA)
  ),
  class = "data.frame",
  row.names = c(NA,-10L)
)