我在Little Book of R中找到了这个用于生物信息学的程序。链接:https://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter7.html
#finds start and stop codons in DNA sequence
#from Avril Coghlan, Little Book of R for Bioinformatics
library(Biostrings)
findPotentialStartsAndStops <- function(sequence)
{
# Define a vector with the sequences of potential start and stop codons
codons <- c("ATG", "TAA", "TAG", "TGA")
# Find the number of occurrences of each type of potential start or stop codon
for (i in 1:4)
{
codon <- codons[i]
# Find all occurrences of codon "codon" in sequence "sequence"
occurrences <- matchPattern(codon, sequence)
# Find the start positions of all occurrences of "codon" in sequence "sequence"
codonpositions <- attr(occurrences,"start")
# Find the total number of potential start and stop codons in sequence "sequence"
numoccurrences <- length(codonpositions)
if (i == 1)
{
# Make a copy of vector "codonpositions" called "positions"
positions <- codonpositions
# Make a vector "types" containing "numoccurrences" copies of "codon"
types <- rep(codon, numoccurrences)
}
else
{
# Add the vector "codonpositions" to the end of vector "positions":
positions <- append(positions, codonpositions, after=length(positions))
# Add the vector "rep(codon, numoccurrences)" to the end of vector "types":
types <- append(types, rep(codon, numoccurrences), after=length(types))
}
}
# Sort the vectors "positions" and "types" in order of position along the input sequence:
indices <- order(positions)
positions <- positions[indices]
types <- types[indices]
# Return a list variable including vectors "positions" and "types":
mylist <- list(positions,types)
return(mylist)
}
s1 <- "ACGGTATGTAATGTGA"
#tried as vector also s1 <- c("A", "C", "G", "G", "T", "A", "T", "G", "T", "A", "A", "T", "G", "T", "G", "A")
findPotentialStartsAndStops(s1)
如果我将DNA序列用作字符串,我会收到错误
Error in .Method(..., na.last = na.last, decreasing = decreasing) :
argument 1 is not a vector
7 .Method(..., na.last = na.last, decreasing = decreasing)
6 eval(expr, envir, enclos)
5 eval(.dotsCall, env)
4 eval(.dotsCall, env)
3 standardGeneric("order")
2 order(positions)
1 findPotentialStartsAndStops(s1)
Called from: (function ()
{
.rs.breakOnError(TRUE)
})()
如果我使用DNA序列作为载体,我会收到错误
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : zero or more than one input sequence
8 .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),
width(solved_SEW), get_seqtype_conversion_lookup("B", seqtype),
PACKAGE = "Biostrings")
7 .charToXString(seqtype, x, start, end, width)
6 XString(NULL, subject)
5 XString(NULL, subject)
4 .XString.matchPattern(pattern, subject, max.mismatch, min.mismatch,
with.indels, fixed, algorithm)
3 matchPattern(codon, sequence)
2 matchPattern(codon, sequence)
1 findPotentialStartsAndStops(s1)
从代码中看,程序似乎期望DNA序列成为特征。
所以看起来可能问题就在于此 发生&lt; - matchPattern(密码子,序列) 关于输入的东西是矢量还是应该是矢量?但密码子已经是一个载体,如果我要求上课(密码子),它就会显示为载体。我不明白什么是错的。
答案 0 :(得分:0)
代码似乎已过时。当前版本的Biostrings
(2.38.2)可能会返回一个与之前不同的对象。这条线
codonpositions <- attr(occurrences,"start")
应替换为
codonpositions <- start(occurrences)