我有一个包含许多列的数据框,如下所示
Column1 Column2 Column3
Q9Y6Y8 P28074 Q9Y6A4
Q9Y6W5 P28066 Q9Y623
Q9Y6H1 P27695 Q9Y5W9
Q5T1J5 P25786;Q9Y623
Q9Y6A4
Q9Y623;P27695;Q9Y623
Q9Y5W9
Q9Y6Y8
所以我想首先将它们放在一起并获得它们的独特之处
Q9Y6Y8
Q9Y6W5
Q9Y6H1
Q5T1J5
Q9Y6A4
Q9Y623
P27695
Q9Y623
Q9Y5W9
Q9Y6Y8
P25786
P28074
P28066
然后我想要两个字符串的组合,如下所示
Q9Y6Y8 Q9Y6W5
Q9Y6Y8 Q9Y6H1
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q5T1J5
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q9Y623
Q9Y6Y8 P27695
Q9Y6Y8 Q9Y623
.
.
.
Q9Y6W5 Q9Y6H1
Q9Y6W5 Q9Y6A4
Q9Y6W5 Q5T1J5
.
.
.
直到所有字符串都在巴黎一次
答案 0 :(得分:3)
我们可以通过unlist
将data.frame(因为data.frame是list
)添加到vector
,按;
分割,然后{{}}来执行此操作1}} unlist
输出(来自list
)并将strsplit
元素作为unique
。
vector
由此,我们可以使用Un1 <- unique(unlist(strsplit(unlist(df1), ";")))
expand.grid
或者,如果我们只需要有限的组合,则可以使用expand.grid(Un1, Un1)
。
combn
注意:这里我假设列都是t(combn(Un1, 2))
# [,1] [,2]
# [1,] "Q9Y6Y8" "Q9Y6W5"
# [2,] "Q9Y6Y8" "Q9Y6H1"
# [3,] "Q9Y6Y8" "Q5T1J5"
# [4,] "Q9Y6Y8" "Q9Y6A4"
# [5,] "Q9Y6Y8" "Q9Y623"
# [6,] "Q9Y6Y8" "P27695"
# [7,] "Q9Y6Y8" "Q9Y5W9"
# [8,] "Q9Y6Y8" "P28074"
# [9,] "Q9Y6Y8" "P28066"
#[10,] "Q9Y6Y8" "P25786"
#[11,] "Q9Y6W5" "Q9Y6H1"
#[12,] "Q9Y6W5" "Q5T1J5"
#[13,] "Q9Y6W5" "Q9Y6A4"
#[14,] "Q9Y6W5" "Q9Y623"
#[15,] "Q9Y6W5" "P27695"
#[16,] "Q9Y6W5" "Q9Y5W9"
#[17,] "Q9Y6W5" "P28074"
#[18,] "Q9Y6W5" "P28066"
#[19,] "Q9Y6W5" "P25786"
#[20,] "Q9Y6H1" "Q5T1J5"
#[21,] "Q9Y6H1" "Q9Y6A4"
#[22,] "Q9Y6H1" "Q9Y623"
#[23,] "Q9Y6H1" "P27695"
#[24,] "Q9Y6H1" "Q9Y5W9"
#[25,] "Q9Y6H1" "P28074"
#[26,] "Q9Y6H1" "P28066"
#[27,] "Q9Y6H1" "P25786"
#[28,] "Q5T1J5" "Q9Y6A4"
#[29,] "Q5T1J5" "Q9Y623"
#[30,] "Q5T1J5" "P27695"
#[31,] "Q5T1J5" "Q9Y5W9"
#[32,] "Q5T1J5" "P28074"
#[33,] "Q5T1J5" "P28066"
#[34,] "Q5T1J5" "P25786"
#[35,] "Q9Y6A4" "Q9Y623"
#[36,] "Q9Y6A4" "P27695"
#[37,] "Q9Y6A4" "Q9Y5W9"
#[38,] "Q9Y6A4" "P28074"
#[39,] "Q9Y6A4" "P28066"
#[40,] "Q9Y6A4" "P25786"
#[41,] "Q9Y623" "P27695"
#[42,] "Q9Y623" "Q9Y5W9"
#[43,] "Q9Y623" "P28074"
#[44,] "Q9Y623" "P28066"
#[45,] "Q9Y623" "P25786"
#[46,] "P27695" "Q9Y5W9"
#[47,] "P27695" "P28074"
#[48,] "P27695" "P28066"
#[49,] "P27695" "P25786"
#[50,] "Q9Y5W9" "P28074"
#[51,] "Q9Y5W9" "P28066"
#[52,] "Q9Y5W9" "P25786"
#[53,] "P28074" "P28066"
#[54,] "P28074" "P25786"
#[55,] "P28066" "P25786"
类。