根据不同的层次级别对矩阵中的单元格进行求和

时间:2015-05-11 20:41:21

标签: r matrix sum

我正在使用R从二进制交互中制作热图。矩阵如下所示

>>> asyncio.iscoroutinefunction(Foo().coroutine)
True
>>> asyncio.iscoroutinefunction(Foo().function)
False
>>> asynctest.Mock(spec=Foo()).coroutine
<class 'asynctest.mock.CoroutineMock'>
>>> asynctest.Mock(spec=Foo()).function
<class 'asynctest.mock.Mock'>

另外,我有与我的ID相对应的元数据

    9   401 562 68  71  569 700
9   0   1   0   0   0   0    1
401 0   0   1   0   0   na   1
562 0   1   0   1   1   0    1 
68  1   1   0   0   0   0    1
71  1   na  0   0   na  0    1
569 1   1   0   1   0   0    0
700 0   0   0    0   0  0    0

我想根据家人的不同程度对细胞进行求和。该表看起来像

    compart group family  category
9    Ex     Prt   A       Ps
401  Ex     Prt   A       Ps
562  Ex     Prt   B       Rh
68   In     Prt   C       En
71   In     Act   D       Stp
569  In     Act   D       Stp
700  Ex     Act   E       Aqua          

并且还希望在隔离层等处执行此操作。

我正在寻找可以避免我手动完成并花费数小时工作的解决方案。

1 个答案:

答案 0 :(得分:1)

你最好的选择是压扁或延长&#34;矩阵。请尝试以下

library(magrittr)
library(data.table)
library(reshape2)

## Let IDs be the metadata data.frame
DT_ids <- as.data.table(Ids, keep.rownames=TRUE)
# DT_ids[, rn := as.numeric(rn)]
setkey(DT_ids, rn)

## Let M be the interactions matrix
## Reshape the interactions data into a tall data.table
DT_interactions <- M %>% 
            as.data.table(keep.rownames=TRUE) %>% 
            melt(id.vars = "rn", value.name="interaction")
## Clean up the column names
setnames(DT_interactions, c("rn", "variable"),  c("rn.rows", "rn.cols"))

## Add in two copies of the meta data
## one for "rows" of M and one for "cols" of M
DT_interactions[, paste0(names(DT_ids), ".rows") :=  DT_ids[.(rn.rows)]]
DT_interactions[, paste0(names(DT_ids), ".cols") :=  DT_ids[.(rn.cols)]]

## Set the key of DT_interactions
setkey(DT_interactions, rn.rows, rn.cols)

## NOW TO SUM UP
DT_interactions[, sum(interaction), by=c("family.rows", "family.cols")]

我会将最后一部分包装在一个很好的函数中

sumByMeta <- function(..., na.rm=TRUE) {
  byCols_simple <- list(...) %>% unlist
  byCols <- byCols_simple %>%
             lapply(paste0, c(".rows", ".cols")) %>%
             unlist

  L <- length(byCols)
  formula <- paste( byCols[1:(L/2)], byCols[(L/2 + 1) : L]
                   , sep=ifelse(L > 2, " + ", "~"), collapse=" ~ ")

  DT_interactions[, sum(interaction, na.rm=na.rm), by=byCols] %>% 
    dcast.data.table(formula=as.formula(formula), value.var="V1") %>%
    setnames(old=seq_along(byCols_simple), new=byCols_simple) %>% {.}
}

## EG: 

sumByMeta("family")
#      family A B C D E
#   1:      A 1 1 0 0 2
#   2:      B 1 0 1 1 1
#   3:      C 2 0 0 0 1
#   4:      D 3 0 1 0 1 
#   5:      E 0 0 0 0 0

## Try running these
sumByMeta("family")
sumByMeta("group")
sumByMeta("family", "group")
sumByMeta("family", "group", "compart")
sumByMeta("family",          "compart")