我有一个这样的对象:
str(sequ[1:2])
$ gi|254574545|ref|XM_002494337.1|:Class 'SeqFastadna' atomic [1:3288] a t g a ...
.. ..- attr(*, "name")= chr "gi|254574545|ref|XM_002494337.1|"
.. ..- attr(*, "Annot")= chr "gi|254574545|ref|XM_002494337.1| Pichia pastoris GS115 ER membrane protein involved in regulation of OLE1 transcription, acts w"| __truncated__
$ gi|254574543|ref|XM_002494336.1|:Class 'SeqFastadna' atomic [1:1614] a t g g ...
.. ..- attr(*, "name")= chr "gi|254574543|ref|XM_002494336.1|"
.. ..- attr(*, "Annot")= chr "gi|254574543|ref|XM_002494336.1| Pichia pastoris GS115 Subunit of the CCR4-NOT complex (PAS_FragD_0003) mRNA, complete cds"
如何连接名称的所有数据:"gi|254574545|ref|XM_002494337.1|" "gi|254574543|ref|XM_002494336.1|"
?
我这样做:
library(seqinr)
sequ = read.fasta(file="../pure_fasta_pichia.fasta", strip.desc = TRUE)
seq_genome = c()
for (i in 1:length(sequ)){
seq_genome = c(seq_genome, sequ[[i]][1:length(sequ[[i]])])
}
我确信可以用lapply或类似的方式来完成。
我试过这个:
seq_genome = c()
seq_genome = lapply(sequ, function(x){seq_genome = c(seq_genome, x)})
我希望有:
seq_genome
a t g a ... a t g g ...
我的数据非常大(>length(sequ) [1] 4903
),每个数据超过200个元素,通常是3k元素。这就是我的计算机崩溃的原因,我不在这里复制dput
。
我该怎么办?我认为我在lapply
内的功能不对,但我不知道如何改进它......
答案 0 :(得分:0)
这里不需要lapply
。您的问题是如何导入数据。您需要as.string = TRUE
,然后您可以使用list
和do.call
简单地连接c
。
# Some package data
dnafile <- system.file("sequences/malM.fasta", package = "seqinr")
# Use as.string = TRUE
x <- read.fasta(file = dnafile, as.string = TRUE)
# Only one fasta entry in this file, so make multiple copies for example
x <- setNames( c( x , x , x ) , letters[1:3] )
# Concatenate and collapse into single sequence (remove str() call - for display purposes)
str( paste0( do.call( c , x ) , collapse = "" ) )
# chr "atgaaaatgaataaaagtctcatcgtcctctgtttatcagcagggttactggcaagcgcgcctggaattagccttgccgatgttaactacgtaccgcaaaacaccagcgacgcgccagccattccat"| __truncated__