我有一个大型数据框,当两行对于个人对时,我必须找到列。
以下是数据框的示例:
>data
ID pos1234 pos1345 pos1456 pos1678
1 1 C A C G
2 2 C G A G
3 3 C A G A
4 4 C G C T
我将数据帧转换为成对矩阵:
apply(data, 2, combn, m=2)
ID pos1234 pos1345 pos1456 pos1678
[1,] "1" "C" "A" "C" "G"
[2,] "2" "C" "G" "A" "G"
[3,] "1" "C" "A" "C" "G"
[4,] "3" "C" "A" "G" "A"
[5,] "1" "C" "A" "C" "G"
[6,] "4" "C" "G" "C" "T"
[7,] "2" "C" "G" "A" "G"
[8,] "3" "C" "A" "G" "A"
[9,] "2" "C" "G" "A" "G"
[10,] "4" "C" "G" "C" "T"
[11,] "3" "C" "A" "G" "A"
[12,] "4" "C" "G" "C" "T"
我现在无法识别包含对之间相同字母的列。例如,对于成对1
和2
,包含相同字母的列将为pos1234
和pos1678
。
是否可以为每对人获得一个只有相同字母的数据框?
提前致谢。
答案 0 :(得分:1)
您可以将功能传递给combn
:
res0 <- combn(nrow(data), 2, FUN = function(x)
names(data[-1])[ lengths(sapply(data[x,-1], unique)) == 1 ], simplify=FALSE)
给出了
[[1]]
[1] "pos1234" "pos1678"
[[2]]
[1] "pos1234" "pos1345"
[[3]]
[1] "pos1234" "pos1456"
[[4]]
[1] "pos1234"
[[5]]
[1] "pos1234" "pos1345"
[[6]]
[1] "pos1234"
要找出哪些[[1]] .. [[6]]对应哪一对,请再次combn
:
res <- setNames(res0, combn(data$ID, 2, paste, collapse="."))
给出了
$`1.2`
[1] "pos1234" "pos1678"
$`1.3`
[1] "pos1234" "pos1345"
$`1.4`
[1] "pos1234" "pos1456"
$`2.3`
[1] "pos1234"
$`2.4`
[1] "pos1234" "pos1345"
$`3.4`
[1] "pos1234"