我有一个包含2列因子变量的数据框,如下所示:
V1 <- c("A","B","C","Y","D","E","F","U","G","H","I","J","R")
V2 <- c("Z","Y","W","B","V","U","T","E","S","R","Q","P","H")
df <- cbind(V1,V2)
df
V1 V2
[1,] "A" "Z"
[2,] "B" "Y"
[3,] "C" "W"
[4,] "Y" "B"
[5,] "D" "V"
[6,] "E" "U"
[7,] "F" "T"
[8,] "U" "E"
[9,] "G" "S"
[10,] "H" "R"
[11,] "I" "Q"
[12,] "J" "P"
[13,] "R" "H"
现在我想用一个函数计算V1和V2的组合等于组合V2和V1并返回它们的所有情况,例如对于df,这个计数将等于3,如下所示:< / p>
y <-combinations_inver(df[,1],df[,2])
y$Combinations
"B""Y"= "Y""B"
"E""U"= "U""E"
"H""R"= "R""H"
y$Count
[1] 3 #because there are three ocurrences (see $Combinations)
答案 0 :(得分:3)
一种简单的方法是:
forwards<-paste(V1,V2)
backwards<-paste(V2,V1)
这两个“集合”的交集将是你要找的,但是R给出了两组匹配,所以你需要将长度除以2:
length(intersect(forwards, backwards))/2
答案 1 :(得分:1)
我们可以使用pmin
和pmax
重新排序每行的元素,然后使用duplicated
查找重复元素的索引,在子集后获取unique
行并获得nrow
m1 <- cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))
i1 <- duplicated(m1)|duplicated(m1, fromLast=TRUE)
nrow(unique(m1[i1,]))
#[1] 3