一段时间以来,我一直在尝试构建一个矩阵,该矩阵由两个等级列表之间共有的元素数量组成。
以下是一些虚拟数据:
site<-c('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B')
group<-c('A1','A1','A2','A2','A2','A3','A3','A3','A3',
'B1','B1','B2','B2','B2','B2')
element<-c("red","orange","blue","black","white", "black","cream","yellow","purple","red","orange","blue","white","gray","salmon")
d<-cbind(site,group,element)
我创建了一个列表结构,假设由于每个列表中os元素的数量不同,它会成为程序性的。另外,由于我不想在组之间进行所有可能的比较,而只希望在站点之间进行比较。
#first level list - by site
sitelist<-split(nodmod, list(nodmod$site),drop = TRUE)
#list by group
nestedlist <- lapply(sitelist, function(x) split(x, x[['mod']], drop = TRUE))
我的意图是创建一个表或矩阵,其中两个站点的组之间的元素数量相同(我的原始数据还有其他站点)。像这样:
A1 A2 A3
B1 2 0 0
B2 0 2 0
这个问题的嵌套性质对我构成了挑战。我对列表不太熟悉,因为我主要使用数据框解决了问题。我的尝试归结为这一点。我觉得它已经接近了,但是循环的正确语法有很多缺点。
t <- outer(1:length(d$A),
1:length(d$B),
FUN=function(i,j){
sapply(1:length(i),
FUN=function(x)
length(intersect(d$A[[i]]$element, d$B[[j]]$element)) )
})
任何帮助将不胜感激。如果解决了类似问题,我们深表歉意。我已经搜索了互联网,但是没有找到它,或者没有理解使它可以转让给我的解决方案。
答案 0 :(得分:3)
通过创建唯一的 element 值的辅助矩阵(通过唯一的 group 值并在每个值中分配一个值)来创建矩阵,从而考虑矩阵乘法x %*% y
(请参阅?matmult
)相应的单元格。然后将矩阵乘法作为与自身的转置,然后是行和列的子集:
# EMPTY MATRIX
helper_mat <- matrix(0, nrow=length(unique(element)), ncol=length(unique(group)),
dimnames=list(unique(element), unique(group)))
# ASSIGN 1's AT SELECT LOCATIONS
for(i in seq_along(site)) {
helper_mat[element[i], group[i]] <- 1
}
helper_mat
# A1 A2 A3 B1 B2
# red 1 0 0 1 0
# orange 1 0 0 1 0
# blue 0 1 0 0 1
# black 0 1 1 0 0
# white 0 1 0 0 1
# cream 0 0 1 0 0
# yellow 0 0 1 0 0
# purple 0 0 1 0 0
# gray 0 0 0 0 1
# salmon 0 0 0 0 1
# MATRIX MULTIPLICATION WITH SUBSET
final_mat <- t(helper_mat) %*% helper_mat
final_mat <- final_mat[grep("B", rownames(final_mat)), grep("A", colnames(final_mat))]
final_mat
# A1 A2 A3
# B1 2 0 0
# B2 0 2 0
感谢@Lamia,甚至版本更短:
helper_mat <- table(element, group)
final_mat <- t(helper_mat) %*% helper_mat # ALTERNATIVELY: crossprod(helper_mat)
final_mat <- final_mat[grep("B", rownames(final_mat)), grep("A", colnames(final_mat))]
final_mat
# group
# group A1 A2 A3
# B1 2 0 0
# B2 0 2 0
答案 1 :(得分:1)
# example dataset
site<-c('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B')
group<-c('A1','A1','A2','A2','A2','A3','A3','A3','A3',
'B1','B1','B2','B2','B2','B2')
element<-c("red","orange","blue","black","white", "black","cream","yellow","purple","red","orange","blue","white","gray","salmon")
d<-cbind(site,group,element)
library(tidyverse)
# save as dataframe
d = data.frame(d)
expand.grid(groupA = unique(d$group[d$site=="A"]),
groupB = unique(d$group[d$site=="B"])) %>% # get all combinations of A and B columns
rowwise() %>% # for each row
mutate(counts = length(intersect(d$element[d$group==groupA],
d$element[d$group==groupB]))) %>% # count common elements
spread(groupA, counts) %>% # reshape data
data.frame() %>%
column_to_rownames("groupB")
# A1 A2 A3
# B1 2 0 0
# B2 0 2 0
您可以使用向量化的函数代替rowwise
,该函数将(自动)应用于每一行,如下所示:
# create a function and vectorise it
CountCommonElements = function(x, y) length(intersect(d$element[d$group==x], d$element[d$group==y]))
CountCommonElements = Vectorize(CountCommonElements)
expand.grid(groupA = unique(d$group[d$site=="A"]),
groupB = unique(d$group[d$site=="B"])) %>%
mutate(counts = CountCommonElements(groupA, groupB)) %>%
spread(groupA, counts) %>%
data.frame() %>%
column_to_rownames("groupB")
# A1 A2 A3
# B1 2 0 0
# B2 0 2 0
答案 2 :(得分:1)
类似于@Parfait使用矩阵乘法的方法。您可能需要处理数据生成以将其扩展到您的应用程序:
site<-c('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B')
group<-c('A1','A1','A2','A2','A2','A3','A3','A3','A3',
'B1','B1','B2','B2','B2','B2')
element<-c("red","orange","blue","black","white", "black","cream","yellow","purple","red","orange","blue","white","gray","salmon")
d<-data.frame(group, el = as.factor(element), stringsAsFactors = FALSE)
As <- d[group %in% paste0("A", 1:3), ]
Bs <- d[group %in% paste0("B", 1:2), ]
A_mat <- as.matrix(table(As))
B_mat <- as.matrix(table(Bs))
结果:
> A_mat
el
group black blue cream gray orange purple red salmon white yellow
A1 0 0 0 0 1 0 1 0 0 0
A2 1 1 0 0 0 0 0 0 1 0
A3 1 0 1 0 0 1 0 0 0 1
> B_mat
el
group black blue cream gray orange purple red salmon white yellow
B1 0 0 0 0 1 0 1 0 0 0
B2 0 1 0 1 0 0 0 1 1 0
> B_mat %*% t(A_mat)
group
group A1 A2 A3
B1 2 0 0
B2 0 2 0