R变换矩阵求和随机分布的列对

时间:2014-10-03 11:57:29

标签: r matrix transformation

我需要如下变换相对大的矩阵(colnum> 30)。设A为矩阵

  

A

              C          C          C          I          I           C           C           I          I          C
[1,] -1.4922530 -0.7777630  0.6179047  1.2980683 -0.2689602  0.62562747 -0.15302102 -0.05579989 -1.5000136 -1.9108030
[2,]  1.8023243 -1.1731071 -0.4516662 -0.4700537  1.0181240  0.06484149 -0.45775976  0.05201139 -0.6803911  1.7147639
[3,]  1.1998167 -0.3753293  1.4655604  0.4930142 -1.6840020 -0.65790455  0.12047651 -0.03418886 -1.4720201 -1.4445862
[4,]  0.2836066  0.8091034 -0.9282385 -0.7789458 -0.7074625 -1.00048502  0.08851702  0.03721331  0.1473371 -0.3057062

并且需要创建一个新矩阵,将I列对(A [,4:5]和A [,8:=]在此示例中)相加,同时保留" C"列未触及,即:

              C          C          C          I           C           C          I          C
[1,] -1.4922530 -0.7777630  0.6179047  1.0291081  0.62562747 -0.15302102 -1.5558135 -1.9108030
[2,]  1.8023243 -1.1731071 -0.4516662  0.5480702  0.06484149 -0.45775976 -0.6283797  1.7147639
[3,]  1.1998167 -0.3753293  1.4655604 -1.1909878 -0.65790455  0.12047651 -1.5062090 -1.4445862
[4,]  0.2836066  0.8091034 -0.9282385 -1.4864083 -1.00048502  0.08851702  0.1845504 -0.3057062

请注意" I"列对可以随机出现。 谢谢你提前注意。

2 个答案:

答案 0 :(得分:1)

这里我使用表达式在仅提取" I"之后提取列对(2 * i-1,2 * i)。原始矩阵中的列:

## get the id of columns having I     
id <- grep("I",colnames(dat))
## substract original matrix
xx <-  dat[,id]
## sum columns by pair , using sapply (maybe it is not the very efficient )
xx <- sapply(seq_len(ncol(xx)/2), function(i) rowSums(xx[,c(2*i-1,2*i)]))

        [,1]       [,2]
[1,]  1.0291081 -1.5558135
[2,]  0.5480703 -0.6283797
[3,] -1.1909878 -1.5062090
[4,] -1.4864083  0.1845504

然后在原始的矩阵中替换这个新的I矩阵:

## use recyclying to replace and remove columns from the original matrix
## the order is important here : replace then remove
dat[,id[c(T,F)]] <- xx           
dat <- dat[,-id[c(F,T)]]  


            C        C.1        C.2          I         C.3         C.4        I.2        C.5
[1,] -1.4922530 -0.7777630  0.6179047  1.0291081  0.62562747 -0.15302102 -1.5558135 -1.9108030
[2,]  1.8023243 -1.1731071 -0.4516662  0.5480703  0.06484149 -0.45775976 -0.6283797  1.7147639
[3,]  1.1998167 -0.3753293  1.4655604 -1.1909878 -0.65790455  0.12047651 -1.5062090 -1.4445862
[4,]  0.2836066  0.8091034 -0.9282385 -1.4864083 -1.00048502  0.08851702  0.1845504 -0.3057062

答案 1 :(得分:1)

你也可以这样做:

indx <- grepl("I", colnames(A))
indx1 <- cumsum(c(1,abs(diff(indx))))
res <- do.call(cbind,lapply(split(seq_along(indx), indx1),function(i) {
         A1 <- A[,i, drop=FALSE]
        if(all(colnames(A1)%in% "I")) 
              matrix(rowSums(A1),ncol=1, dimnames=list(NULL, "I"))
              else A1}))

res
#             C           C          C         I          C          C
#[1,]  1.3709584  0.40426832  2.0184237 -1.673114 -0.3066386  1.8951935
#[2,] -0.5646982 -0.10612452 -0.0627141 -2.935244 -1.7813084 -0.4304691
#[3,]  0.3631284  1.51152200  1.3048697 -2.573788 -0.1719174 -0.2572694
#[4,]  0.6328626 -0.09465904  2.2866454  1.956064  1.2146747 -1.7631631
#              I           C
#[1,]  1.4952009 -0.78445901
#[2,] -1.2489213 -0.85090759
#[3,]  0.9604052 -2.41420765
#[4,] -1.0121713  0.03612261

数据

set.seed(42)
A <- matrix(rnorm(10*4), ncol=10, dimnames=list(NULL, 
             c(rep("C",3), "I","I", "C","C", "I", "I", "C")))