我将FASTA格式的对齐方式导入R
read.dna(file.choose(),format="fasta",skip=0)
我的对齐看起来像这样
Seq1 ATGCGGGAATGGACTCATGCATCG
Seq2 ATTCGATCTTGCTAGCTAGCTCGT
Seq3 ATATCGATGTCGATCGATCGACGA
如果我想在此对齐中调用单个序列(例如Seq2),我需要做什么?
答案 0 :(得分:1)
我不知道read.dna()
来自何处(有> 6000个CRAN包,以及近1000个Bioconductor个包)。您可以使用Biostrings包和
library(Biostrings)
dna = readDNAStringSet("path/to.fasta")
并做许多有用的事情,包括quick reference中描述的内容。如果最后你想要一个单一的字符向量,那么
as.character(dna[1])
或
as.character(dna[names(dna) == "Seq3"])
答案 1 :(得分:0)
我猜你正在使用ape
包。使用?read.dna
library(ape)
cat(">No305",
"NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",
">No304",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",
">No306",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",
file = "exdna.txt", sep = "\n")
ex.dna4 <- read.dna("exdna.txt", format = "fasta")
ex.dna4[dimnames(ex.dna4)[[1]]=='No304',]
#1 DNA sequences in binary format stored in a matrix.
#All sequences of same length: 40
#Labels: No304
#Base composition:
# a c g t
#0.475 0.300 0.025 0.200
as.character(ex.dna4[dimnames(ex.dna4)[[1]]=='No304'])
#[1] "a" "t" "t" "c" "g" "a" "a" "a" "a" "a" "c" "a" "c" "a" "c" "c" "c" "a" "c"
#[20] "t" "a" "c" "t" "a" "a" "a" "a" "a" "t" "t" "a" "t" "c" "a" "a" "c" "c" "a"
#[39] "c" "t"