Question

我有一个如下生成的数据集：

cn <- c("Cop-1", "Cop-2", "LEW-1", "Lew-3", "Cop-3", "SHR-2", "LEW-2", 
"SHRP-3", "SHRP-1")
rn <- paste(rep("Gene_", 4), 1:4, sep = "")
start <- matrix(nrow = 4, ncol = 9)
rownames(start) <- rn
colnames(start) <- cn
start[1, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[2, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[3, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[4, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)

看起来像这样：

       Cop-1 Cop-2 LEW-1 Lew-3 Cop-3 SHR-2 LEW-2 SHRP-3 SHRP-1
Gene_1     0  0.01     3     4 0.001    11     5     15     46
Gene_2     0  0.01     3     4 0.001    11     5     15     46
Gene_3     0  0.01     3     4 0.001    11     5     15     46
Gene_4     0  0.01     3     4 0.001    11     5     15     46`

我想扫描此数据集并根据以下标准获取新的重新编码数据集：

如果Gene_n的值对于所有重复（例如SHRP-1,2和3）>> 10，那么在新矩阵中，Gene_n的SHRP值将为1。如果Gene_n的值是＆lt; 1表示所有重复（例如Cop-1,2和3），然后在新矩阵中，Gene_n的Cop值将为0。任何其他场景（例如LEW-1,2和3）都被指定为0.5。

最终数据集应如下所示：

cn2 <- c("Cop", "LEW", "SHRP")
end <- matrix(nrow = 4, ncol = 3)
colnames(end) <- cn2
rownames(end) <- rn
end[1, ] <- c(0, 0.5, 1)
end[2, ] <- c(0, 0.5, 1)
end[3, ] <- c(0, 0.5, 1)
end[4, ] <- c(0, 0.5, 1) 

       Cop LEW SHRP
Gene_1   0 0.5    1
Gene_2   0 0.5    1
Gene_3   0 0.5    1
Gene_4   0 0.5    1

感谢您的协助。我尝试过使用split函数和dplyr，但是无法获得所需的结果。我通过搜索找到了这个问题，而ot（Split data frame based on column name pattern and then rebind into 1 data frame）让我很接近，但又一次，不是我需要的结果。

感谢您的帮助。

Answer 1

cn2 <- c("Cop", "LEW", "SHRP")
end <- sapply(cn2, function(x){
  cols <- grep(paste0('^', x, '-', '[1-9]+'), colnames(start))
  apply(start[, cols], MARGIN =1, function(y) {
    if(all(y >= 10, na.rm = T)) return(1)
    if(all(y <1, na.rm = T)) return(0)
    return(0.5)
  })
})

rownames(end) <- rn

       Cop LEW SHRP
Gene_1   0 0.5    1
Gene_2   0 0.5    1
Gene_3   0 0.5    1
Gene_4   0 0.5    1

通过colname部分匹配聚合矩阵，然后根据条件

1 个答案: