获取序列的特定子字符串

时间:2012-05-28 17:43:52

标签: string r

我在R中创建了以下矩阵:

positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))

我还有以下字符串:

"SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

我正在尝试使用apply函数来创建子串列表(mystring,start.position,end.position),其中第一个索引来自位置[,1],第二个索引来自位置[,2] 。我可以使用for循环轻松完成此操作,但我认为应用会更快。

我可以按照以下方式开展工作,但我想知道是否有更清洁的方式:

parse.me = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4), input)
apply(parse.me, MARGIN = 1, get.AA.seqres)

get.AA.seqres <- function(items){
start.position = as.numeric(items[1])
end.position = as.numeric(items[2])
string = items[3]
return (substr(string, start.position, end.position)  )
}

2 个答案:

答案 0 :(得分:3)

试试这个:

> substring(input, positions[, 1], positions[, 2])
 [1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

答案 1 :(得分:0)

我喜欢Andrie的实用建议,但如果出于其他原因你需要走这条路,你的问题听起来好像可以通过Vectorize()解决:

#Your data
positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))
input <- "SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

#Vectorize the function substr()
vsubstr <- Vectorize(substr, USE.NAMES = FALSE)
vsubstr(input, positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

#Or, read the help page on ?substr about the bit for recycling in the first paragraph of details

substr(rep(input, nrow(positions)), positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"