我一直在徘徊这个程序问题,但我还没有得到明确答案......
我有两个对象,比如a和b。对象a是一个字符串,代表一个RNA序列,如下所示:
> a
[1] "C" "A" "C" "C" "U" "U" "G" "U" "C" "C" "U" "C" "A" "C" "G" "G" "U" "C" "C" "A" "G" "U" "U" "U" "U" "C" "C" "C" "A" "G"
[31] "G" "A" "A" "U" "C" "C" "C" "U" "U" "A" "G" "A" "U" "G" "C" "U" "G" "A" "G" "A" "U" "G" "G" "G" "G" "A" "U" "U" "C" "C"
[61] "U" "G" "G" "A" "A" "A" "U" "A" "C" "U" "G" "U" "U" "C" "U" "U" "G" "A" "G" "G" "U" "C" "A" "U" "G" "G"
对象b是另一个字符串,表示折叠结构,其中"("符号表示来自a的字母,与同一序列中的另一个字母配对,表示为") &#34 ;.符号"。"意味着这封信没有配对。
对象b看起来像这样:
> b
[1] "." "." "(" "(" "." "(" "(" "." "(" "(" "(" "(" "(" "." "(" "(" "." "." "(" "(" "(" "(" "." "(" "(" "." "(" "(" "(" "("
[31] "(" "(" "(" "(" "(" "(" "(" "(" "." "." "." "." "." "." "." "." "." "." "." "." "." ")" ")" ")" ")" ")" ")" ")" ")" ")"
[61] ")" ")" ")" "." ")" ")" "." ")" ")" ")" ")" "." "." ")" ")" ")" ")" ")" ")" ")" "." ")" ")" "." ")" ")"
如果计算每个对象a和b中的字符数,它们是相同的,这意味着b中的第一个字符对应于a中的第一个字符,依此类推。比方说,在这种情况下,[1]是" C",对应于b [1],即"。",表示序列中的这个字母未配对,但是当我们到达b [3]时,它是"(",a中的第一个配对字母,对应于[3]或" C"。这第一个&# 34;("在b中,成对字母" C&#34 ;,在a中,加入到最后")"符号在b中,对应于b [86]因此对应于[86],即" G"。
第一个"(" b中与最后一个&#34形成一对;)"在b等等。
如您所见,我的目标是确定序列中出现多少A-U,C-G和G-U对。
我有那里的信息,但是我想不出R中的程序化方法(我正在构建我的算法以从这两个对象中提取其他特征),这就解决了这个问题。
我想过提取每个"("以及每个")"的索引号,并用它来找到a中的相应字母,然后组合一个[ 3]用[86]等,形成另一个对象。
所需的输出就像构造一个由对组合组成的对象,比如c:
> c
[1] "CG" "CG" "UA" "GC" "CG" "CG" "UA" "CG" "AU" "GU" "GC"....
因此,我可以计算多少CG,GC并添加它们,AU,UA和添加它们多少,以及添加它们多少GU或UG,从而获得多少AU,GC和GU在序列中。
任何帮助?
答案 0 :(得分:0)
当所有“(”在第一个“之前”)时,你可以简单地提取两个子矢量,反转第二个并合并它们:
data.frame(pair1 = a[b == "("], pair2 = rev(a[b == ")"]))
或者你可以使用
mapply(paste0, a[b == "("], rev(a[b == ")"]))
如果您实际上正在寻找具有多个循环的通用解决方案,则可以将stack
与for
循环合并:
library(dequer)
s <- stack()
q <- queue()
for (i in seq_along(a)) {
if(b[i] == "(")
push(s, a[i])
else if(b[i] == ")")
pushback(q, paste0(pop(s), a[i]))
}
unlist(as.list(q))