R中的位置特定分数矩阵(PSSM)如何打印出得分最高的序列?

时间:2018-03-27 11:13:00

标签: r dataframe matrix

示例PSSM矩阵

DataFram(df)看起来像:

   v1   v2  v3  v4  v5  v6  v7  v8  v9  v10 v11 v12 v13 v14 v15
A   1   2   2   1   2   1   1   2   1   2   1   0   0   2   2
T   1   1   1   2   1   1   1   0   2   2   2   2   2   0   1
G   0   2   1   1   0   0   1   0   2   2   1   2   0   1   1
C   1   1   2   2   0   1   2   2   1   0   0   1   2   2   0

如何从pssm矩阵中提取出得分最高的序列?

 seq = "ATGCGGCATTAT"
# split seq by 5 (in this case)
#splited = split_n(seq,5)
    # print_out: 
    #Example desire output    
    #start_from sequence value
        #0  0   ATGCG   5
        #1  1   TGCGG   4
        #2  2   GCGGC   5
        #3  3   CGGCA   6
        #4  4   GGCAT   6
        #5  5   GCATT   8
        #6  6   CATTA   9
        #7  7   ATTAT   9

其他方式我尝试但错误。在这种情况下,我如何检索得分最高的序列?

        library(Biostrings)
        dna = DNAString("ATGCGGCATTATATGCGGCATTATATGCGGCATTAT")

        pwm = rbind(A=c(1,2,2,1,2,1,1,2,1,2,1,0,0,2,2),
                    T=c(1,1,1,2,1,1,1,0,2,2,2,2,2,0,1),
                    G=c(0,2,1,1,0,0,1,0,2,2,1,2,0,1,1),
                    C=c(1,1,2,2,0,1,2,2,1,0,0,1,2,2,0))

        pwm = pwm + 1
        i = 1
        while (i <= ncol(pwm)) {
              pwm[,i]<-pwm[,i]/sum(pwm[,i])
              i = i + 1
        }
        pssm = log2(pwm/0.25)
        scores = PWMscoreStartingAt(pssm, dna, starting.at=1:(length(dna)-ncol(pwm)+1))
    #Error in .normargPwm(pwm) : 'rownames(pwm)' must be the 4 DNA bases ('DNA_BASES')
   #print(max(scores)) 
   #print(which.max(scores))
   #in this case how do i retrieve the sequence with highest score 
   #print the sequence with highest score

1 个答案:

答案 0 :(得分:0)

这不是一个完整的答案:(在评论中写得过于混乱)

你的矩阵应该定义如下:

第一列是第一个位置!!!

spring.mvc.view.prefix=/WEB-INF/views/
spring.mvc.view.suffix=.jsp

看起来像

df<-
fread("rname   1   2   3   4   5   6   7   8   9
A   2   1   0   1   2   0   1   0   2
      T   1   1   0   0   2   1   2   1   0
      G   1   0   0   2   2   2   2   1   2
      C   1   2   1   2   0   0   1   0   2",header=T) %>% data.frame(., row.names=1)

然后使用&#34;矩阵数字索引&#34;来自 X1 X2 X3 X4 X5 X6 X7 X8 X9 A 2 1 0 1 2 0 1 0 2 T 1 1 0 0 2 1 2 1 0 G 1 0 0 2 2 2 2 1 2 C 1 2 1 2 0 0 1 0 2

  

?"["

例如。

m[cbind(c(1,2,1),3:1)]# matrix numeric index

玩得开心玩弄自己的超级功能。