DataFram(df)看起来像:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
A 1 2 2 1 2 1 1 2 1 2 1 0 0 2 2
T 1 1 1 2 1 1 1 0 2 2 2 2 2 0 1
G 0 2 1 1 0 0 1 0 2 2 1 2 0 1 1
C 1 1 2 2 0 1 2 2 1 0 0 1 2 2 0
如何从pssm矩阵中提取出得分最高的序列?
seq = "ATGCGGCATTAT"
# split seq by 5 (in this case)
#splited = split_n(seq,5)
# print_out:
#Example desire output
#start_from sequence value
#0 0 ATGCG 5
#1 1 TGCGG 4
#2 2 GCGGC 5
#3 3 CGGCA 6
#4 4 GGCAT 6
#5 5 GCATT 8
#6 6 CATTA 9
#7 7 ATTAT 9
其他方式我尝试但错误。在这种情况下,我如何检索得分最高的序列?
library(Biostrings)
dna = DNAString("ATGCGGCATTATATGCGGCATTATATGCGGCATTAT")
pwm = rbind(A=c(1,2,2,1,2,1,1,2,1,2,1,0,0,2,2),
T=c(1,1,1,2,1,1,1,0,2,2,2,2,2,0,1),
G=c(0,2,1,1,0,0,1,0,2,2,1,2,0,1,1),
C=c(1,1,2,2,0,1,2,2,1,0,0,1,2,2,0))
pwm = pwm + 1
i = 1
while (i <= ncol(pwm)) {
pwm[,i]<-pwm[,i]/sum(pwm[,i])
i = i + 1
}
pssm = log2(pwm/0.25)
scores = PWMscoreStartingAt(pssm, dna, starting.at=1:(length(dna)-ncol(pwm)+1))
#Error in .normargPwm(pwm) : 'rownames(pwm)' must be the 4 DNA bases ('DNA_BASES')
#print(max(scores))
#print(which.max(scores))
#in this case how do i retrieve the sequence with highest score
#print the sequence with highest score
答案 0 :(得分:0)
这不是一个完整的答案:(在评论中写得过于混乱)
你的矩阵应该定义如下:
spring.mvc.view.prefix=/WEB-INF/views/
spring.mvc.view.suffix=.jsp
看起来像
df<-
fread("rname 1 2 3 4 5 6 7 8 9
A 2 1 0 1 2 0 1 0 2
T 1 1 0 0 2 1 2 1 0
G 1 0 0 2 2 2 2 1 2
C 1 2 1 2 0 0 1 0 2",header=T) %>% data.frame(., row.names=1)
然后使用&#34;矩阵数字索引&#34;来自 X1 X2 X3 X4 X5 X6 X7 X8 X9
A 2 1 0 1 2 0 1 0 2
T 1 1 0 0 2 1 2 1 0
G 1 0 0 2 2 2 2 1 2
C 1 2 1 2 0 0 1 0 2
?"["
例如。
m[cbind(c(1,2,1),3:1)]# matrix numeric index
玩得开心玩弄自己的超级功能。