我有一个名为mydf
的表。我有数百个以'ssd'开头的专栏。如果只有一个字母,我需要grep这些coulmns并将其中的值转换为0/0,如果有两个字母,则将其转换为0/1。我还需要跳过具有'ND',空白单元格或除“A”,“T”,“G”和“C”字母组合之外的任何单元格(不做任何操作)。结果表应该看起来像myresult
。
是myDF
wws:ddf:xx ssd:ddf:xx sqt:ddf:xx wws:dde:xy ssd:dde:xy sqt:dde:xy
G GA
GA AT
GT
ND GA
GT TG
G A
myresult
wws:ddf:xx ssd:ddf:xx sqt:ddf:xx wws:dde:xy ssd:dde:xy sqt:dde:xy
0/0 0/1
0/1 0/1
0/1
ND 0/1
0/1 0/1
0/0 0/0
答案 0 :(得分:1)
使用此代码重现示例数据
mydf <-
structure(list(`wws:ddf:xx` = c("", "", "", "", "", ""),
`ssd:ddf:xx` = c("G", "GA", "GT", "ND", "GT", "G"),
`sqt:ddf:xx` = c("", "", "", "", "", ""),
`wws:dde:xy` = c("", "", "", "", "", ""),
`ssd:dde:xy` = c("GA", "AT", "", "GA", "TG", "A"),
`sqt:dde:xy` = c("", "", "", "", "", "")),
.Names = c("wws:ddf:xx", "ssd:ddf:xx", "sqt:ddf:xx", "wws:dde:xy", "ssd:dde:xy", "sqt:dde:xy"),
row.names = c(NA, -6L), class = "data.frame")
我创建了一个函数来执行一列中的更改
change <- function(x) {
# for ease, change all valid letters to digit 1
y <- gsub("[ATGC]", "1", x)
# count number of digits 1
z <- sapply(strsplit(y, ""), function(x) sum(x=="1"))
# corresponding text for number of digits (1 or 2), to be mapped later
txt <- c("0/0", "0/1")
# identify rows where digits 1 are found
idx <- which(z>0)
# if there's digit 1 replace with corresponding text in mapping above
x[idx] <- txt[z[idx]]
return(x)
}
然后识别以ssd
开头的列ssdcols <- grep("^ssd", names(mydf))
并将该函数应用于所有此类列(保存为数据框)
mydf[, ssdcols] <- as.data.frame(lapply(mydf[, ssdcols], change),
stringsAsFactors=F)
按需输出
> mydf
wws:ddf:xx ssd:ddf:xx sqt:ddf:xx wws:dde:xy ssd:dde:xy sqt:dde:xy
1 0/0 0/1
2 0/1 0/1
3 0/1
4 ND 0/1
5 0/1 0/1
6 0/0 0/0