我是R编程的新手,我正在尝试为Reverse和Complementary Base编写一个程序。目的是设计DNA引物。所以我有一个基因A T C G和A补体T的DNA序列; T = A; C = G; G =℃。 我已经想出如何反转它,但是对于补语,我只能让它只回答1个碱基,但不能完全是序列,我不知道如何组合反向和补充函数。这是我的代码,我完全对它感到困惑。有人可以帮我解决这个问题吗?你将成为我生命的救星!
strReverse <- function(x)
sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("ATCGGTCAATCGA"))
complement.base = function(base){
if(base == 'A' | base == 'a') print("T")
if(base == 'T' | base == 't') print("A")
if(base == 'G' | base == 'g') print("C")
if(base == 'C' | base == 'c') print("G")}
complement.base(base="A")
答案 0 :(得分:2)
您可以使用Rcpp有效地执行操作:
library(Rcpp)
revComp.rcpp <- cppFunction(
"std::string comp(std::string x) {
const int n = x.length();
for (int i=0; i < n; ++i) {
if (x[i] == 'A' || x[i] == 'a') x[i] = 'T';
else if (x[i] == 'T' || x[i] == 't') x[i] = 'A';
else if (x[i] == 'G' || x[i] == 'g') x[i] = 'C';
else x[i] = 'G';
}
std::reverse(x.begin(), x.end());
return x;
}")
revComp.rcpp("ATCGGTCAATCGA")
# [1] "TCGATTGACCGAT"
这似乎比Biostrings包中的相关代码(在具有1300万个碱基的字符串上测试)要快一些:
library(Biostrings)
x <- "ATCGGTCAATCGA"
big.x <- paste(rep(x, 1000000), collapse="")
big.x2 <- DNAString(big.x)
rev.biostr <- function(x) as.character(reverseComplement(x))
all.equal(revComp.rcpp(big.x), as.character(reverseComplement(big.x2)))
# [1] TRUE
library(microbenchmark)
microbenchmark(revComp.rcpp(big.x), as.character(reverseComplement(big.x2)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# revComp.rcpp(big.x) 77.21618 78.44534 84.54397 82.21002 87.49367 123.8166 100
# as.character(reverseComplement(big.x2)) 144.13900 151.12869 170.73765 156.44300 164.41374 399.2948 100
答案 1 :(得分:1)
我实际上会考虑使用基数R中的chartr
,并在stringi
的帮助下反转结果(或输入)。
myFun <- function(invec) {
require(stringi)
invec <- stri_reverse(invec)
chartr(old = "AaTtGgCc", new = "TTAACCGG", invec)
}
x <- "ATCGGTCAATCGA"
myFun(x)
# [1] "TCGATTGACCGAT"
使用@ josilber的样本数据,它与他的Rcpp方法非常相似:
all.equal(myFun(big.x), revComp.rcpp(big.x))
# [1] TRUE
library(microbenchmark)
microbenchmark(myFun(big.x), revComp.rcpp(big.x))
# Unit: milliseconds
# expr min lq mean median uq max neval
# myFun(big.x) 349.5797 352.8197 362.3009 356.4484 362.7197 437.9556 100
# revComp.rcpp(big.x) 359.5485 363.8615 378.3465 368.3360 386.3734 444.2901 100