转换R

时间:2015-09-18 03:06:35

标签: r

我有一个名为mymat的矩阵(近似为446664 X 234)。它有REF和ALT列,它们可以有A,T,G,C字母(只有一个字母)。在以.GT结尾的列中,我想替换这些字母。要匹配的条件是,如果有0,我想用REF列中的字母替换它,如果有1,那么我想用ALT列中的字母替换它。如果有NA,我想用“0”“0”(即零空间零)替换它。最后,我需要反转行中的所有.GT列(转置),如result所示。结果,一切都被空格分开。

mymat<-structure(list(REF = structure(c(1L, 4L, 3L, 2L, 3L), .Label = c("A", 
"C", "G", "T"), class = "factor"), ALT = structure(c(6L, 6L, 
1L, 9L, 1L), .Label = c("A", "A", "A", "A,T", "C", "C", "C", 
"G", "G", "T"), class = "factor"), X860.GT = structure(c(1L, 3L, 
2L, 1L, 1L), .Label = c("NA", "0/0", "0/1", "0/1", "1/1"), class = "factor"), 
    X861.GT = structure(c(1L, 6L, 2L, 1L, 1L), .Label = c("NA", 
    "0/0", "0/1", "0/1", "1/1", "1/1"), class = "factor"), X862.GT = structure(c(6L, 
    3L, 1L, 2L, 1L), .Label = c("NA", "0/0", "0/1", "0/1", "1/1", 
    "1/1"), class = "factor")), .Names = c("REF", "ALT", "X860.GT", 
"X861.GT", "X862.GT"), row.names = c(NA, -5L), class = "data.frame")

结果

X860 0 0 T C G G 0 0 0 0
X861 0 0 C C G G 0 0 0 0 
X862 C C T C 0 0 C C 0 0

1 个答案:

答案 0 :(得分:1)

不太优雅但完成工作。

m = as.data.frame(lapply(mymat, as.character), stringsAsFactors=F)
m[m=="NA"] = '0 0'

fix = function(x) {
  for (i in 1:length(x)) {
    if (x[i] == '0/0') {
      x[i] = paste(m[i,1], m[i,1])
    }
    else if (x[i] == '0/1') {
      x[i] = paste(m[i,1], m[i,2])
    }
    else if (x[i] == '1/0') {
      x[i] = paste(m[i,2], m[i,1])
    }
    else if (x[i] == '1/1') {
      x[i] = paste(m[i,2], m[i,2])
    }
  }
  x
}

m[,3:5] <- lapply(m[,3:5], fix)
m = t(data.frame(lapply(m[,3:5], function (x) unlist (strsplit(x," ")))))
rownames(m) = sub(".GT","",rownames(m))

m的输出:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
X860 "0"  "0"  "T"  "C"  "G"  "G"  "0"  "0"  "0"  "0"  
X861 "0"  "0"  "C"  "C"  "G"  "G"  "0"  "0"  "0"  "0"  
X862 "C"  "C"  "T"  "C"  "0"  "0"  "C"  "C"  "0"  "0"