通过R中的ID获得唯一组合(组合)

时间:2016-09-23 04:37:37

标签: r data.table combn

我尝试通过每个ID获得唯一的组合,我一直得到错误,它不会扩展ID。

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,6)
var1 <- c("A","B","E","F","C","D","C","A","B","C","A","D","B","C",
      "A","B","C","A","D","C","A","B","C","E","F","G")
df1 <- data.frame(ID,var1)
df1 <- df1[order(df1$ID, df1$var1),]
dd <- unique(df1)
dd <- data.table(dd)
dd[,new4 := t(combn(sort(var1), m = 3))[,1],by= "ID"]
dd[,new5:= t(combn(sort(var1), m = 3))[,2],by="ID"]
dd[,new6:= t(combn(sort(var1), m = 3))[,3],by="ID"]

Warning message:
In `[.data.table`(dd, , `:=`(new4, t(combn(sort(var1), m = 3))[,  :
RHS 1 is length 10 (greater than the size (5) of group 1). The last 5 element(s) will be discarded.

     ID var1 new4 new5 new6
 1:  1    A    A    B    C
 2:  1    B    A    B    E
 3:  1    C    A    B    F
 4:  1    E    A    C    E
 5:  1    F    A    C    F
 6:  2    A    A    B    C
 7:  2    B    A    B    D
 8:  2    C    A    C    D
 9:  2    D    B    C    D
10:  3    A    A    B    C
11:  3    B    A    B    D
12:  3    C    A    C    D
13:  3    D    B    C    D
14:  4    A    A    B    C
15:  4    B    A    B    C
16:  4    C    A    B    C
17:  5    A    A    B    C
18:  5    B    A    B    D
19:  5    C    A    C    D
20:  5    D    B    C    D
21:  6    C    C    E    F
22:  6    E    C    E    G
23:  6    F    C    F    G
24:  6    G    E    F    G

输出不能通过每个ID ID1(A,B,C,E,F)给出足够的组合,它只提供5种组合。无论如何解决问题?输出我想要ID1,有10种组合(ABC)(ACF)(ABF)(ABE)(BCE)(BCF)(CAB)(CAE)(CAF)(ECF)

1 个答案:

答案 0 :(得分:0)

@BIN由于组合数通常与“Var1”的唯一字母数不匹配,您可以尝试以下方法:

 library(dplyr)      
 dd[,var1:=as.character(var1)]

 dd[,.(Numb.Combinations = choose(var1 %>% uniqueN,3),
             ID1 = paste0(var1 %>% unique, collapse=""),
      Combinations = paste(combn(var1,3,function(x) paste0(x,collapse = "")),collapse="-")),   
   by="ID"]

输出类似于您在最后请求的输出:

   ID Numb.Combinations   ID1                            Combinations
1:  1                10 ABCEF ABC-ABE-ABF-ACE-ACF-AEF-BCE-BCF-BEF-CEF
2:  2                 4  ABCD                         ABC-ABD-ACD-BCD
3:  3                 4  ABCD                         ABC-ABD-ACD-BCD
4:  4                 1   ABC                                     ABC
5:  5                 4  ABCD                         ABC-ABD-ACD-BCD
6:  6                 4  CEFG                         CEF-CEG-CFG-EFG  

如果您愿意,或者按照@akrun和@frank的建议,

 dd <- dd[, c(ID1 = paste0(var1 %>% unique, collapse=""),
             transpose(combn(sort(var1), 3, simplify = F))), by = ID]
colnames(dd) <- c("ID","ID1","New1","New2","New3")

输出:

    ID   ID1 New1 New2 New3
 1:  1 ABCEF    A    B    C
 2:  1 ABCEF    A    B    E
 3:  1 ABCEF    A    B    F
 4:  1 ABCEF    A    C    E
 5:  1 ABCEF    A    C    F
 6:  1 ABCEF    A    E    F
 7:  1 ABCEF    B    C    E
 8:  1 ABCEF    B    C    F
 9:  1 ABCEF    B    E    F
10:  1 ABCEF    C    E    F
11:  2  ABCD    A    B    C
12:  2  ABCD    A    B    D
13:  2  ABCD    A    C    D
14:  2  ABCD    B    C    D
15:  3  ABCD    A    B    C
16:  3  ABCD    A    B    D
17:  3  ABCD    A    C    D
18:  3  ABCD    B    C    D
19:  4   ABC    A    B    C
20:  5  ABCD    A    B    C
21:  5  ABCD    A    B    D
22:  5  ABCD    A    C    D
23:  5  ABCD    B    C    D
24:  6  CEFG    C    E    F
25:  6  CEFG    C    E    G
26:  6  CEFG    C    F    G
27:  6  CEFG    E    F    G
    ID   ID1 New1 New2 New3