在R

时间:2018-04-14 20:44:48

标签: r string indexing structure

我一直在徘徊这个程序问题,但我还没有得到明确答案......

我有两个对象,比如a和b。对象a是一个字符串,代表一个RNA序列,如下所示:

> a
[1] "C" "A" "C" "C" "U" "U" "G" "U" "C" "C" "U" "C" "A" "C" "G" "G" "U" "C" "C" "A" "G" "U" "U" "U" "U" "C" "C" "C" "A" "G"
[31] "G" "A" "A" "U" "C" "C" "C" "U" "U" "A" "G" "A" "U" "G" "C" "U" "G" "A" "G" "A" "U" "G" "G" "G" "G" "A" "U" "U" "C" "C"
[61] "U" "G" "G" "A" "A" "A" "U" "A" "C" "U" "G" "U" "U" "C" "U" "U" "G" "A" "G" "G" "U" "C" "A" "U" "G" "G"

对象b是另一个字符串,表示折叠结构,其中"("符号表示来自a的字母,与同一序列中的另一个字母配对,表示为") &#34 ;.符号"。"意味着这封信没有配对。

对象b看起来像这样:

> b
[1] "." "." "(" "(" "." "(" "(" "." "(" "(" "(" "(" "(" "." "(" "(" "." "." "(" "(" "(" "(" "." "(" "(" "." "(" "(" "(" "("
[31] "(" "(" "(" "(" "(" "(" "(" "(" "." "." "." "." "." "." "." "." "." "." "." "." "." ")" ")" ")" ")" ")" ")" ")" ")" ")"
[61] ")" ")" ")" "." ")" ")" "." ")" ")" ")" ")" "." "." ")" ")" ")" ")" ")" ")" ")" "." ")" ")" "." ")" ")"

如果计算每个对象a和b中的字符数,它们是相同的,这意味着b中的第一个字符对应于a中的第一个字符,依此类推。比方说,在这种情况下,[1]是" C",对应于b [1],即"。",表示序列中的这个字母未配对,但是当我们到达b [3]时,它是"(",a中的第一个配对字母,对应于[3]或" C"。这第一个&# 34;("在b中,成对字母" C&#34 ;,在a中,加入到最后")"符号在b中,对应于b [86]因此对应于[86],即" G"。

第一个"(" b中与最后一个&#34形成一对;)"在b等等。

如您所见,我的目标是确定序列中出现多少A-U,C-G和G-U对。

我有那里的信息,但是我想不出R中的程序化方法(我正在构建我的算法以从这两个对象中提取其他特征),这就解决了这个问题。

我想过提取每个"("以及每个")"的索引号,并用它来找到a中的相应字母,然后组合一个[ 3]用[86]等,形成另一个对象。

所需的输出就像构造一个由对组合组成的对象,比如c:

> c
[1] "CG" "CG" "UA" "GC" "CG" "CG" "UA" "CG" "AU" "GU" "GC"....

因此,我可以计算多少CG,GC并添加它们,AU,UA和添加它们多少,以及添加它们多少GU或UG,从而获得多少AU,GC和GU在序列中。

任何帮助?

1 个答案:

答案 0 :(得分:0)

当所有“(”在第一个“之前”)时,你可以简单地提取两个子矢量,反转第二个并合并它们:

data.frame(pair1 = a[b == "("], pair2 = rev(a[b == ")"]))

或者你可以使用

mapply(paste0, a[b == "("], rev(a[b == ")"]))

如果您实际上正在寻找具有多个循环的通用解决方案,则可以将stackfor循环合并:

library(dequer)
s <- stack()
q <- queue()
for (i in seq_along(a)) {
    if(b[i] == "(")
        push(s, a[i])
    else if(b[i] == ")") 
        pushback(q, paste0(pop(s), a[i]))
}
unlist(as.list(q))