同时计算组合因子和它们的倒数

时间:2016-05-31 19:03:34

标签: r count

我有一个包含2列因子变量的数据框,如下所示:

V1 <- c("A","B","C","Y","D","E","F","U","G","H","I","J","R")
V2 <- c("Z","Y","W","B","V","U","T","E","S","R","Q","P","H")
df <- cbind(V1,V2)
df
 V1  V2 
[1,] "A" "Z"
[2,] "B" "Y"
[3,] "C" "W"
[4,] "Y" "B"
[5,] "D" "V"
[6,] "E" "U"
[7,] "F" "T"
[8,] "U" "E"
[9,] "G" "S"
[10,] "H" "R"
[11,] "I" "Q"
[12,] "J" "P"
[13,] "R" "H"

现在我想用一个函数计算V1和V2的组合等于组合V2和V1并返回它们的所有情况,例如对于df,这个计数将等于3,如下所示:< / p>

 y <-combinations_inver(df[,1],df[,2])

 y$Combinations
 "B""Y"= "Y""B"
 "E""U"= "U""E"
 "H""R"= "R""H"

 y$Count
[1] 3 #because there are three ocurrences (see $Combinations)

2 个答案:

答案 0 :(得分:3)

一种简单的方法是:

forwards<-paste(V1,V2)
backwards<-paste(V2,V1)

这两个“集合”的交集将是你要找的,但是R给出了两组匹配,所以你需要将长度除以2:

length(intersect(forwards, backwards))/2

答案 1 :(得分:1)

我们可以使用pminpmax重新排序每行的元素,然后使用duplicated查找重复元素的索引,在子集后获取unique行并获得nrow

m1 <- cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))
i1 <- duplicated(m1)|duplicated(m1, fromLast=TRUE)
nrow(unique(m1[i1,]))
#[1] 3