这是我的数据:
group <- c(1,1,1,1,2,2,2,3,3,4,4,4,4)
X1 <- c("A","A","A","A","B","A","B","A","A","B","B","B","B")
X2 <- c("A","A","A","A","B","B","B","A","A","B","B","A","A")
X3 <- c("B","A","A","A","B","B","B","B","B","B","B","B","B")
X4 <- c("A","A","A","B","B","B","A","A","A","B","A","B","B")
X5 <- c("A","A","A","A","B","B","B","A","A","A","B","B","B")
X6 <- c("A","A","A","A","B","A","B","A","A","B","B","A","A")
mydf <- data.frame (group, X1, X2, X3, X4, X5, X6)
因此数据是:
group X1 X2 X3 X4 X5 X6
1 1 A A B A A A
2 1 A A A A A A
3 1 A A A A A A
4 1 A A A B A A
5 2 B B B B B B
6 2 A B B B B A
7 2 B B B A B B
8 3 A A B A A A
9 3 A A B A A A
10 4 B B B B A B
11 4 B B B A B B
12 4 B A B B B A
13 4 B A B B B A
现在我需要将第一行与组中的其余行进行比较。
group X1 X2 X3 X4 X5 X6
1 1 A A B A A A
2 1 A A A A A A
TRUE TRUE FALSE TRUE TRUE TRUE
此处不匹配仅在X3处。 6个中的1个= 1/6 = 17%
同样地将3与第1组中的第1进行比较。
group X1 X2 X3 X4 X5 X6
1 1 A A B A A A
3 1 A A A A A A
不匹配= 17%
同样将第4组与第1组进行比较。
group X1 X2 X3 X4 X5 X6
1 1 A A B A A A
4 1 A A A B A A
不匹配= 2/6 = 34%
类似于第2组(组的第1行,即5和6)
group X1 X2 X3 X4 X5 X6
5 2 B B B B B B
6 2 A B B B B A
不匹配= 2/6 = 34%
类似地:
group X1 X2 X3 X4 X5 X6
5 2 B B B B B B
7 2 B B B A B B
不匹配= 1/6 = 17%
我的试用版:
match (mydf[1,], mydf[2,])
match (mydf[1,], mydf[3,])
答案 0 :(得分:6)
试试这个:
match_ratio <- function(x)
cbind(x, match_ratio = rowMeans(mapply(`==`, x[1, -1], x[, -1])))
library(plyr)
ddply(mydf, "group", match_ratio)
# group X1 X2 X3 X4 X5 X6 match_ratio
# 1 1 A A B A A A 1.0000000
# 2 1 A A A A A A 0.8333333
# 3 1 A A A A A A 0.8333333
# 4 1 A A A B A A 0.6666667
# 5 2 B B B B B B 1.0000000
# 6 2 A B B B B A 0.6666667
# 7 2 B B B A B B 0.8333333
# 8 3 A A B A A A 1.0000000
# 9 3 A A B A A A 1.0000000
# 10 4 B B B B A B 1.0000000
# 11 4 B B B A B B 0.6666667
# 12 4 B A B B B A 0.5000000
# 13 4 B A B B B A 0.5000000
答案 1 :(得分:2)
## generate pairs of row numbers
rows <- sequence(nrow(mydf))
grid <- subset(expand.grid(Var1=rows,Var2=rows),Var1 > Var2)
## define some functions
comparison1 <- function(a,b,x)
match(x[a,-1],x[b,-1])
comparison2 <- function(a,b,x)
x[a,-1]==x[b,-1]
## apply (comparison1 or comparison2)
matches <- t(mapply(comparison1,grid$Var2,grid$Var1,MoreArgs=list(x=mydf)))
dimnames(matches) <- list(paste(grid$Var2,grid$Var1,sep=","),
names(mydf)[-1])
如果您使用comparison1
> head(matches)
X1 X2 X3 X4 X5 X6
1,2 1 1 NA 1 1 1
1,3 1 1 NA 1 1 1
1,4 1 1 4 1 1 1
1,5 NA NA 1 NA NA NA
1,6 1 1 2 1 1 1
1,7 4 4 1 4 4 4
如果您使用comparison2
> head(matches)
X1 X2 X3 X4 X5 X6
1,2 TRUE TRUE FALSE TRUE TRUE TRUE
1,3 TRUE TRUE FALSE TRUE TRUE TRUE
1,4 TRUE TRUE FALSE FALSE TRUE TRUE
1,5 FALSE FALSE TRUE FALSE FALSE FALSE
1,6 TRUE FALSE TRUE FALSE FALSE TRUE
1,7 FALSE FALSE TRUE TRUE FALSE FALSE
行名对应于您要比较的行号。