Question

我在R中创建了以下矩阵：

positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))

我还有以下字符串：

"SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

我正在尝试使用apply函数来创建子串列表（mystring，start.position，end.position），其中第一个索引来自位置[，1]，第二个索引来自位置[，2] 。我可以使用for循环轻松完成此操作，但我认为应用会更快。

我可以按照以下方式开展工作，但我想知道是否有更清洁的方式：

parse.me = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4), input)
apply(parse.me, MARGIN = 1, get.AA.seqres)

get.AA.seqres <- function(items){
start.position = as.numeric(items[1])
end.position = as.numeric(items[2])
string = items[3]
return (substr(string, start.position, end.position)  )
}

Answer 1

试试这个：

> substring(input, positions[, 1], positions[, 2])
 [1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

Answer 2

我喜欢Andrie的实用建议，但如果出于其他原因你需要走这条路，你的问题听起来好像可以通过Vectorize()解决：

#Your data
positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))
input <- "SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

#Vectorize the function substr()
vsubstr <- Vectorize(substr, USE.NAMES = FALSE)
vsubstr(input, positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

#Or, read the help page on ?substr about the bit for recycling in the first paragraph of details

substr(rep(input, nrow(positions)), positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

获取序列的特定子字符串

2 个答案: