使用rownames映射和用户定义的函数对矩阵进行子集化

时间:2016-11-01 09:42:25

标签: r matrix dataframe data.table apply

我有一个矩阵,并希望使用映射和函数对其进行子集化。

示例:使用runifset.seed随机填充矩阵以获得再现性。

set.seed(1)
exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')

exp.mat
         s1       s2       s3       s4       s5       s6
a  5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
b1 5.497331 8.254352 6.668875 6.999972 5.294672 8.273620
b2 6.581359 6.290084 7.381756 6.626761 8.211441 6.765986
b3 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
c  8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
d1 7.034151 5.421235 6.949948 8.555606 8.986544 8.167466
d2 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041
e1 6.468017 6.695365 9.803090 6.227443 7.050420 5.646862
e2 7.295329 9.197202 7.173297 5.716522 9.054351 7.390590

包含原始矩阵rown的列rownames的映射,列map包含相应的映射。

maps <- data.frame(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1','e1'), 
                   map =c('a','b','b','b','c','d','d','e','f','g'))
maps

   rown map
 1    a   a
 2   b1   b
 3   b2   b
 4   b3   b
 5    c   c
 6   d1   d
 7   d2   d
 8   e1   e
 9   e1   f
10   e1   g

函数,mean在此处用于在有更多映射时选择行(案例2)。

apply(exp.mat, 1, mean)
       a       b1       b2       b3        c       d1       d2       e1       e2 
6.922362 6.831470 6.976231 8.160829 8.555789 7.519158 7.796410 6.981866 7.637882 

基于映射,

  1. 如果rown中只有一个值映射到map那么它应该 直接复制整行。例如:ac只有一个映射。
  2. 如果rown中有多个值映射到map那么它 应该从上面的结果函数中复制具有最高值的整行。例如:b1b2b3映射到b; b3的{​​{1}}最高。因此,必须选择mean,同样选择b3
  3. 如果d2中有一个值映射到多个值中 rown然后它应该丢弃这些行。例如:map有多个映射值e1e
  4. 如果没有映射,则丢弃该行。例如:f没有相应的映射。
  5. 预期输出:子集矩阵

    e2

    请告知,如何以有效的方式实现这一目标?

    我已经实现了这个眼球和下面的代码

    > exp.mat.trans
            s1       s2       s3       s4       s5       s6
    a 5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
    b 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
    c 8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
    d 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041
    

    仅仅识别指数可能有用,因为没有值的转换?

    exp.mat.trans <- exp.mat[c(1,4,5,7),]
    rownames(exp.mat.trans) <- c('a','b','c','d')
    

    # Index Subsetting ind <- c(1,4,5,7) exp.mat.trans2 <- exp.mat[ind,] rownames(exp.mat.trans2) <- maps[ind, 'map'] exp.mat.trans相同!

    修改

    exp.mat.trans2map可能不一样!

1 个答案:

答案 0 :(得分:2)

如果你想拥有一个有效的解决方案,我认为最好使用data.tables进行映射。如果我运行它,你的输入矩阵会有所不同。我找到了以下问题的解决方案:

set.seed(1)
exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')
> exp.mat
         s1       s2       s3       s4       s5       s6
a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
b1 6.860619 6.029873 8.887226 9.348454 5.539718 5.116656
b2 7.864267 5.882784 9.673526 6.701745 8.618555 7.386150
b3 9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
d1 9.491948 8.849207 5.627775 7.467707 8.235301 7.388098
d2 9.723376 7.488496 6.336103 5.931088 8.914664 9.306047
e1 8.303989 8.588093 6.930570 9.136867 7.765182 7.190486
e2 8.145570 9.959530 5.066952 8.342334 7.648598 6.223986
maps <- data.table(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1'), 
                   map =c('a','b','b','b','c','d','d','e','f'))
#RULE 2 calculate mean of each row
maps[, value := rowMeans(exp.mat)[rown]]
# aggregate such that we know which mapping should be made (RULE 2)
maps <- maps[, rown[which.max(value)], by = map]
# Delete if more mappings are made first find the number of mappings (RULE 3)
number_map <- maps[,.N, by = V1]
setkey(maps, "V1")
# Delete if more than one time a mapping is found
maps <- maps[number_map[N < 2, V1]] 
# Now subset the matrix
exp.mat.sub <- exp.mat[maps$V1[maps$V1 %in% rownames(exp.mat)],]
rownames(exp.mat.sub) <- maps[match(maps$V1, rownames(exp.mat.sub))]$map
exp.mat.sub
         s1       s2       s3       s4       s5       s6
a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
b  9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
d  9.723376 7.488496 6.336103 5.931088 8.914664 9.306047