在R

时间:2016-06-01 08:59:25

标签: r matrix vector

我的问题是在每个i的循环中 - 像这样输出的矩阵

structure(c(8L, 4L, 3L, 4L, 1L, 8L, 28L, 32L, 24L, 32L, 8L, 64L, 
0L, 6L, 12L, 16L, 4L, 32L, 0L, 0L, 3L, 12L, 3L, 24L, 0L, 0L, 
0L, 6L, 4L, 32L, 0L, 0L, 0L, 0L, 0L, 8L, 0L, 0L, 0L, 0L, 0L, 
28L), .Dim = 6:7, .Dimnames = structure(list(c("ESN", "GWD", 
"LWK", "MSL", "PEL", "YRI"), c("ACB", "ESN", "GWD", "LWK", "MSL", 
"PEL", "YRI")), .Names = c("", "")), class = "table")

此矩阵统计分数 - 现在应将这些计数添加到更大的表中 - 其中的级别多于此表中的7个级别。它总是一个对称矩阵(因此上三角)可以忽略

真实表(初始化中所有元素都为0)

matr<-matrix(0,nrow=26,ncol=26)
pop<-c("CHB","JPT","CHS","CDX","KHV","CEU","TSI","FIN","GBR","IBS","YRI","LWK","GWD","MSL","ESN","ASW","ACB","MXL","PUR","CLM","PEL","GIH","PJL","BEB","STU","ITU")

rownames(matr)<-pop
colnames(matr)<-pop

有人可以告诉我如何以有效的方式将这些计数从小表添加到大表(在正确的字段中)吗?我需要更新表格100k时间 - 因此效果会很好。如上所述,下三角形的加入很好......

EDI #####

所以另一个数据集 - 可能看起来像(这将从循环的下一次迭代生成)

structure(c(1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L), .Dim = c(3L, 
3L), .Dimnames = structure(list(c("IBS", "MXL", "TSI"), c("GBR", 
"IBS", "MXL")), .Names = c("", "")), class = "table")

然后这也应该添加到matr中 - 如果某个字段之前有一个数字,那么这两个数字应该加起来

由于

1 个答案:

答案 0 :(得分:1)

考虑每个&#34;表&#34;中的重复/非等于/非零条目。通过迭代创建并仅更新&#34; matr&#34;:

lower.tri
for(tab in tabs) {
     ## if each 'tab' is large enough, 
     ## instead of creating (and subsetting with) 'row(tab)' and 'col(tab)'
     ##, a 'rep(, each = )' could be used
     i = match(rownames(tab), rownames(mat))[row(tab)]
     j = match(colnames(tab), colnames(mat))[col(tab)]

     ## to fill only the 'lower.tri'
     ii = pmax(i, j); jj = pmin(i, j)

     ## sum duplicate entries 'tab' with 'sparseMatrix's intrinsic 'xtabs'-like behaviour
     ijx = summary(sparseMatrix(ii, jj, x = c(tab)))

     ## subset and assign with a matrix index updating previous entries
     ij = cbind(ijx$i, ijx$j)
     mat[ij] = mat[ij] + ijx$x
}
mat
#  a  b c d e
#a 0  0 0 0 0
#b 4  1 0 0 0
#c 6  7 2 0 0
#d 5 12 5 7 0
#e 4  6 3 3 0

其中&#34;标签&#34;是一个&#34;列表&#34;包含-iteratively-created&#34; table&#34; s:

set.seed(007)            
tabs = replicate(3, table(replicate(2, 
                                    sample(letters[1:5], 50, TRUE), simplify = FALSE))[
                                        sample(5, sample(2:5, 1)), sample(5, sample(2:5, 1))], 
                 simplify = FALSE)

和&#34; mat&#34;是一个较小的&#34; matr&#34;:

mat = matrix(0L, 5, 5, dimnames = replicate(2, letters[1:5], simplify = FALSE))