如何替换R中选择列中的值?

时间:2015-07-23 03:31:09

标签: r

我有一个名为mydf的表。我有数百个以'ssd'开头的专栏。如果只有一个字母,我需要grep这些coulmns并将其中的值转换为0/0,如果有两个字母,则将其转换为0/1。我还需要跳过具有'ND',空白单元格或除“A”,“T”,“G”和“C”字母组合之外的任何单元格(不做任何操作)。结果表应该看起来像myresult

是myDF

wws:ddf:xx  ssd:ddf:xx  sqt:ddf:xx  wws:dde:xy  ssd:dde:xy  sqt:dde:xy
               G                                     GA 
               GA                                    AT 
               GT                                       
               ND                                    GA 
               GT                                    TG 
               G                                     A  

myresult

wws:ddf:xx  ssd:ddf:xx  sqt:ddf:xx  wws:dde:xy  ssd:dde:xy  sqt:dde:xy
              0/0                                   0/1 
              0/1                                   0/1 
              0/1                                           
              ND                                    0/1 
              0/1                                   0/1 
              0/0                                   0/0 

1 个答案:

答案 0 :(得分:1)

使用此代码重现示例数据

mydf <-
  structure(list(`wws:ddf:xx` = c("", "", "", "", "", ""),
                 `ssd:ddf:xx` = c("G", "GA", "GT", "ND", "GT", "G"),
                 `sqt:ddf:xx` = c("", "", "", "", "", ""),
                 `wws:dde:xy` = c("", "", "", "", "", ""),
                 `ssd:dde:xy` = c("GA", "AT", "", "GA", "TG", "A"),
                 `sqt:dde:xy` = c("", "", "", "", "", "")),
            .Names = c("wws:ddf:xx", "ssd:ddf:xx", "sqt:ddf:xx", "wws:dde:xy", "ssd:dde:xy", "sqt:dde:xy"),
            row.names = c(NA, -6L), class = "data.frame")

我创建了一个函数来执行一列中的更改

change <- function(x) {
  # for ease, change all valid letters to digit 1
  y <- gsub("[ATGC]", "1", x)
  # count number of digits 1
  z <- sapply(strsplit(y, ""), function(x) sum(x=="1"))
  # corresponding text for number of digits (1 or 2), to be mapped later
  txt <- c("0/0", "0/1")
  # identify rows where digits 1 are found
  idx <- which(z>0)
  # if there's digit 1 replace with corresponding text in mapping above
  x[idx] <- txt[z[idx]]
  return(x)
}

然后识别以ssd

开头的列
ssdcols <- grep("^ssd", names(mydf))

并将该函数应用于所有此类列(保存为数据框)

mydf[, ssdcols] <- as.data.frame(lapply(mydf[, ssdcols], change),
                                 stringsAsFactors=F)

按需输出

> mydf
  wws:ddf:xx ssd:ddf:xx sqt:ddf:xx wws:dde:xy ssd:dde:xy sqt:dde:xy
1                   0/0                              0/1           
2                   0/1                              0/1           
3                   0/1                                            
4                    ND                              0/1           
5                   0/1                              0/1           
6                   0/0                              0/0