根据样本名称中的因素,从数据框创建子矩阵

时间:2014-12-01 12:09:24

标签: r

我有一个巨大的不同样本之间的成对相似性百分比矩阵。样本属于组。这些组由row.names / header名称中的后缀“_n”确定。 在第一步中,我想创建由单个组内的所有对组成的子矩阵(即,对于来自“_1”的所有样本)。 但是,我意识到我需要知道所有组合子组之间的所有成对子矩阵。所以,我想为n的所有组合创建一个名为“_n1 vs _n2”(或类似)的矢量列表,如彩色矩形所示:

Example

可重复的代码,由有用的Stack Overflow成员提供,处理相同的“_n”。

    df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8, 88.9, 
    87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4, 
    97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4, 
    98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4, 
    NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4, 
    98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9, 
    97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8, 
    94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2, 
    96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3, 
    95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names = c("HQ673618_1", 
    "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2", "EU491340_2", 
    "AB694259_3", "AB694258_3", "AB694462_3"), class = "data.frame", row.names = c("HQ673618_1", 
    "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2", "EU491340_2", 
    "AB694259_3", "AB694258_3", "AB694462_3"))


    indx <- gsub(".*_", "", names(df))
    sub.matrices <- lapply(unique(indx), function(x) {
      temp <- which(indx %in% x) 
      df[temp, temp]
    })
    unique_values <- lapply(sub.matrices, function(x) x[upper.tri(x)])
    names(unique_values) <- unique(indx)

需要扩展此代码以形成temp中唯一索引的任意组合的子矩阵。

此问题基于my former question 1,后者又基于my former question 2

非常感谢你!

0 个答案:

没有答案