如何根据字符串进行组合

时间:2016-07-05 15:15:00

标签: r

我有一个包含许多列的数据框,如下所示

Column1           Column2           Column3
Q9Y6Y8             P28074           Q9Y6A4
Q9Y6W5             P28066           Q9Y623
Q9Y6H1             P27695           Q9Y5W9
Q5T1J5             P25786;Q9Y623 
Q9Y6A4
Q9Y623;P27695;Q9Y623
Q9Y5W9
Q9Y6Y8

所以我想首先将它们放在一起并获得它们的独特之处

Q9Y6Y8                        
Q9Y6W5                     
Q9Y6H1                       
Q5T1J5             
Q9Y6A4
Q9Y623
P27695
Q9Y623
Q9Y5W9
Q9Y6Y8 
P25786
P28074
P28066   

然后我想要两个字符串的组合,如下所示

Q9Y6Y8 Q9Y6W5   
Q9Y6Y8 Q9Y6H1                       
Q9Y6Y8 Q9Y6A4                           
Q9Y6Y8 Q5T1J5             
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q9Y623
Q9Y6Y8 P27695
Q9Y6Y8 Q9Y623
    .
    .
    .
Q9Y6W5 Q9Y6H1
Q9Y6W5 Q9Y6A4
Q9Y6W5 Q5T1J5 
    .
    .
    .

直到所有字符串都在巴黎一次

1 个答案:

答案 0 :(得分:3)

我们可以通过unlist将data.frame(因为data.frame是list)添加到vector,按;分割,然后{{}}来执行此操作1}} unlist输出(来自list)并将strsplit元素作为unique

vector

由此,我们可以使用Un1 <- unique(unlist(strsplit(unlist(df1), ";")))

获取所有组合
expand.grid

或者,如果我们只需要有限的组合,则可以使用expand.grid(Un1, Un1)

combn

注意:这里我假设列都是t(combn(Un1, 2)) # [,1] [,2] # [1,] "Q9Y6Y8" "Q9Y6W5" # [2,] "Q9Y6Y8" "Q9Y6H1" # [3,] "Q9Y6Y8" "Q5T1J5" # [4,] "Q9Y6Y8" "Q9Y6A4" # [5,] "Q9Y6Y8" "Q9Y623" # [6,] "Q9Y6Y8" "P27695" # [7,] "Q9Y6Y8" "Q9Y5W9" # [8,] "Q9Y6Y8" "P28074" # [9,] "Q9Y6Y8" "P28066" #[10,] "Q9Y6Y8" "P25786" #[11,] "Q9Y6W5" "Q9Y6H1" #[12,] "Q9Y6W5" "Q5T1J5" #[13,] "Q9Y6W5" "Q9Y6A4" #[14,] "Q9Y6W5" "Q9Y623" #[15,] "Q9Y6W5" "P27695" #[16,] "Q9Y6W5" "Q9Y5W9" #[17,] "Q9Y6W5" "P28074" #[18,] "Q9Y6W5" "P28066" #[19,] "Q9Y6W5" "P25786" #[20,] "Q9Y6H1" "Q5T1J5" #[21,] "Q9Y6H1" "Q9Y6A4" #[22,] "Q9Y6H1" "Q9Y623" #[23,] "Q9Y6H1" "P27695" #[24,] "Q9Y6H1" "Q9Y5W9" #[25,] "Q9Y6H1" "P28074" #[26,] "Q9Y6H1" "P28066" #[27,] "Q9Y6H1" "P25786" #[28,] "Q5T1J5" "Q9Y6A4" #[29,] "Q5T1J5" "Q9Y623" #[30,] "Q5T1J5" "P27695" #[31,] "Q5T1J5" "Q9Y5W9" #[32,] "Q5T1J5" "P28074" #[33,] "Q5T1J5" "P28066" #[34,] "Q5T1J5" "P25786" #[35,] "Q9Y6A4" "Q9Y623" #[36,] "Q9Y6A4" "P27695" #[37,] "Q9Y6A4" "Q9Y5W9" #[38,] "Q9Y6A4" "P28074" #[39,] "Q9Y6A4" "P28066" #[40,] "Q9Y6A4" "P25786" #[41,] "Q9Y623" "P27695" #[42,] "Q9Y623" "Q9Y5W9" #[43,] "Q9Y623" "P28074" #[44,] "Q9Y623" "P28066" #[45,] "Q9Y623" "P25786" #[46,] "P27695" "Q9Y5W9" #[47,] "P27695" "P28074" #[48,] "P27695" "P28066" #[49,] "P27695" "P25786" #[50,] "Q9Y5W9" "P28074" #[51,] "Q9Y5W9" "P28066" #[52,] "Q9Y5W9" "P25786" #[53,] "P28074" "P28066" #[54,] "P28074" "P25786" #[55,] "P28066" "P25786" 类。