如何将多个类别变量重新组合为一个新变量

时间:2020-06-13 21:03:56

标签: r

我有一个带有两列(A,B)的data.frame(df):

   A    B
1  a TCRB
2  a TCRG
3  a TCRB
4  b TCRB
5  b TCRG
6  c TCRB
7  c TCRB
8  c TCRB
9  c TCRB
10 d TCRG
11 d TCRG
12 d TCRG

我想创建一个新列“ C”作为波纹管,告诉我“ A”中的每个唯一变量是同时具有TCRB和TCRG还是它们之一(0 =仅TCRB,1 =仅TCRG,2 =都),如下:

A: a b c d 
C: 2 2 0 1 

非常感谢您的帮助!

3 个答案:

答案 0 :(得分:3)

这里是dplyr的一种方法:

library(dplyr)
df %>% 
  group_by(A) %>%
  dplyr::summarise(C = case_when("TCRB" %in% B & "TCRG" %in% B ~ 2,
                                 "TCRB" %in% B ~ 0,
                                 "TCRG" %in% B ~ 1,
                                 TRUE ~ NA_real_)) 
# A tibble: 4 x 2
  A         C
  <fct> <dbl>
1 a         2
2 b         2
3 c         0
4 d         1

答案 1 :(得分:2)

带有n_distinct

的选项
library(dplyr)
df %>%
    group_by(A) %>%
    summarise(C = n_distinct(B) *!all(B == 'TCRB'))
# A tibble: 4 x 2
#  A         C
#  <chr> <int>
#1 a         2
#2 b         2
#3 c         0
#4 d         1

数据

df <- structure(list(A = c("a", "a", "a", "b", "b", "c", "c", "c", 
"c", "d", "d", "d"), B = c("TCRB", "TCRG", "TCRB", "TCRB", "TCRG", 
"TCRB", "TCRB", "TCRB", "TCRB", "TCRG", "TCRG", "TCRG")),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

答案 2 :(得分:0)

在Base R中,我们可以使用aggregate

aggregate(B~A, df, function(x) {
    if(all(c('TCRB', 'TCRG') %in% x)) 2
    else if(any(x == 'TCRG')) 1
    else if(any(x == 'TCRB')) 0
    else NA
})

#  A B
#1 a 2
#2 b 2
#3 c 0
#4 d 1