我需要如下变换相对大的矩阵(colnum> 30)。设A为矩阵
A
C C C I I C C I I C
[1,] -1.4922530 -0.7777630 0.6179047 1.2980683 -0.2689602 0.62562747 -0.15302102 -0.05579989 -1.5000136 -1.9108030
[2,] 1.8023243 -1.1731071 -0.4516662 -0.4700537 1.0181240 0.06484149 -0.45775976 0.05201139 -0.6803911 1.7147639
[3,] 1.1998167 -0.3753293 1.4655604 0.4930142 -1.6840020 -0.65790455 0.12047651 -0.03418886 -1.4720201 -1.4445862
[4,] 0.2836066 0.8091034 -0.9282385 -0.7789458 -0.7074625 -1.00048502 0.08851702 0.03721331 0.1473371 -0.3057062
并且需要创建一个新矩阵,将I列对(A [,4:5]和A [,8:=]在此示例中)相加,同时保留" C"列未触及,即:
C C C I C C I C
[1,] -1.4922530 -0.7777630 0.6179047 1.0291081 0.62562747 -0.15302102 -1.5558135 -1.9108030
[2,] 1.8023243 -1.1731071 -0.4516662 0.5480702 0.06484149 -0.45775976 -0.6283797 1.7147639
[3,] 1.1998167 -0.3753293 1.4655604 -1.1909878 -0.65790455 0.12047651 -1.5062090 -1.4445862
[4,] 0.2836066 0.8091034 -0.9282385 -1.4864083 -1.00048502 0.08851702 0.1845504 -0.3057062
请注意" I"列对可以随机出现。 谢谢你提前注意。
答案 0 :(得分:1)
这里我使用表达式在仅提取" I"之后提取列对(2 * i-1,2 * i)。原始矩阵中的列:
## get the id of columns having I
id <- grep("I",colnames(dat))
## substract original matrix
xx <- dat[,id]
## sum columns by pair , using sapply (maybe it is not the very efficient )
xx <- sapply(seq_len(ncol(xx)/2), function(i) rowSums(xx[,c(2*i-1,2*i)]))
[,1] [,2]
[1,] 1.0291081 -1.5558135
[2,] 0.5480703 -0.6283797
[3,] -1.1909878 -1.5062090
[4,] -1.4864083 0.1845504
然后在原始的矩阵中替换这个新的I矩阵:
## use recyclying to replace and remove columns from the original matrix
## the order is important here : replace then remove
dat[,id[c(T,F)]] <- xx
dat <- dat[,-id[c(F,T)]]
C C.1 C.2 I C.3 C.4 I.2 C.5
[1,] -1.4922530 -0.7777630 0.6179047 1.0291081 0.62562747 -0.15302102 -1.5558135 -1.9108030
[2,] 1.8023243 -1.1731071 -0.4516662 0.5480703 0.06484149 -0.45775976 -0.6283797 1.7147639
[3,] 1.1998167 -0.3753293 1.4655604 -1.1909878 -0.65790455 0.12047651 -1.5062090 -1.4445862
[4,] 0.2836066 0.8091034 -0.9282385 -1.4864083 -1.00048502 0.08851702 0.1845504 -0.3057062
答案 1 :(得分:1)
你也可以这样做:
indx <- grepl("I", colnames(A))
indx1 <- cumsum(c(1,abs(diff(indx))))
res <- do.call(cbind,lapply(split(seq_along(indx), indx1),function(i) {
A1 <- A[,i, drop=FALSE]
if(all(colnames(A1)%in% "I"))
matrix(rowSums(A1),ncol=1, dimnames=list(NULL, "I"))
else A1}))
res
# C C C I C C
#[1,] 1.3709584 0.40426832 2.0184237 -1.673114 -0.3066386 1.8951935
#[2,] -0.5646982 -0.10612452 -0.0627141 -2.935244 -1.7813084 -0.4304691
#[3,] 0.3631284 1.51152200 1.3048697 -2.573788 -0.1719174 -0.2572694
#[4,] 0.6328626 -0.09465904 2.2866454 1.956064 1.2146747 -1.7631631
# I C
#[1,] 1.4952009 -0.78445901
#[2,] -1.2489213 -0.85090759
#[3,] 0.9604052 -2.41420765
#[4,] -1.0121713 0.03612261
set.seed(42)
A <- matrix(rnorm(10*4), ncol=10, dimnames=list(NULL,
c(rep("C",3), "I","I", "C","C", "I", "I", "C")))