大家好我对R中的矩阵运算有疑问。我有一个类似下面的数据集:
d <- data.frame(id=c(2,3,4,5,6,6,8,11,11,11,12,12,12),author=c("FN","VM","VA","FK","VM","SM","FK","FK","VB","VA","FK","VB","VA"))
d
id author
1 2 FN
2 3 VM
3 4 VA
4 5 FK
5 6 VM
6 6 SM
7 8 FK
8 11 FK
9 11 VB
10 11 VA
11 12 FK
12 12 VB
13 12 VA
> m <- xtabs(~author+id,d)
> m
id
author 2 3 4 5 6 8 11 12
FK 0 0 0 1 0 1 1 1
FN 1 0 0 0 0 0 0 0
SM 0 0 0 0 1 0 0 0
VA 0 0 1 0 0 0 1 1
VB 0 0 0 0 0 0 1 1
VM 0 1 0 0 1 0 0 0
我想要做的是通过乘以每一行,从第2列的作者列表中生成对组合。例如,对于FK-VA对,其在相关矩阵中的相应行是:
FK 0 0 0 1 0 1 1 1
VA 0 0 1 0 0 0 1 1
我的矩阵中的预期结果应该产生行的每个元素的乘法:
FK-VA (0*0),(0*0),(0*1),(1*0),(0*0),(1*0),(1*1),(1*1)
FK-VA 0 0 0 0 0 1 1
FK FN 0 0 0 0 0 0 0
FK SM 0 0 0 0 0 0 0
FK VA 0 0 0 0 0 1 1
FK VB 0 0 0 0 0 1 1
FK VM 0 0 0 0 0 0 0
FN SM 0 0 0 0 0 0 0
FN VA 0 0 0 0 0 0 0
FN VB 0 0 0 0 0 0 0
FN VM 0 0 0 0 0 0 0
SM VA 0 0 0 0 0 0 0
SM VB 0 0 0 0 0 0 0
SM VM 0 0 0 1 0 0 0
VA VB 0 0 0 0 0 1 1
VA VM 0 0 0 0 0 0 0
VB VM 0 0 0 0 0 0 0
如您所见,我需要步骤2和3的帮助。
谢谢
马里奥
答案 0 :(得分:0)
可能有帮助
indx <- combn(dimnames(m)$author,2)
res <- cbind(t(indx), as.data.frame(do.call(rbind,
lapply(split(indx, col(indx)), function(x) m[x[1],]*m[x[2],]))))
colnames(res)[1:2] <- paste0('author', 1:2)
head(res,3)
# author1 author2 2 3 4 5 6 8 11 12
#1 FK FN 0 0 0 0 0 0 0 0
#2 FK SM 0 0 0 0 0 0 0 0
#3 FK VA 0 0 0 0 0 0 1 1
或者
cbind(t(indx),as.data.frame(t(combn(dimnames(m)$author,2,
FUN=function(x) m[x[1],] * m[x[2],]))))
如果要对至少具有0
res1 <- res[!!rowSums(res[,-(1:2)]),]
对于sum
,您可以执行rowSums
res$Sum <- rowSums(res[,-(1:2)])
head(res,3)
# author1 author2 2 3 4 5 6 8 11 12 Sum
#1 FK FN 0 0 0 0 0 0 0 0 0
#2 FK SM 0 0 0 0 0 0 0 0 0
#3 FK VA 0 0 0 0 0 0 1 1 2
关于将res1
除以CL
CL <- colSums(res1[,-(1:2)])
CL <- CL-1
CL[ CL<1 ] <- 0
res1[-(1:2)]/CL[col(res1[-(1:2)])]
关于新数据集,
d <- read.csv('AuthorsRevised.csv', stringsAsFactors=FALSE)
m <- xtabs(~Authors+ID,d)
indx <- combn(dimnames(m)$Authors,2)
dim(indx)
#[1] 2 435711
res <- cbind(t(indx), as.data.frame(do.call(rbind,
lapply(split(indx, col(indx)), function(x) m[x[1],]*m[x[2],]))))
colnames(res)[1:2] <- paste0('author', 1:2)
dim(res)
#[1] 435711 534
res[1:3,1:3]
# author1 author2 1
#1 Abe S.-i. Achterberg W. 0
#2 Abe S.-i. Adebowale B.O.A. 0
#3 Abe S.-i. Aghion P. 0
如果您只想要sum
,则另一个选项是
t1 <- crossprod(table(d))
t1[upper.tri(t1, diag=TRUE)] <- NA
library(reshape2)
res1 <- melt(t1, na.rm=TRUE)[,c(2:1,3)]