将DNAstringsets列表折叠到单个DNAStingset中以应用writeXStringSet()并将其转换为R中的fasta文件

时间:2014-10-10 16:09:02

标签: r fasta dna-sequence

在这里使用R进行生物信息学: 我有一个DNAstringsSets列表(见下文),并希望使用writeXstringset()函数,该函数将DNAstringset对象作为参数保存为FASTA文件。任何人都知道如何将DNAstringsets列表折叠成一个单个DNAStringset对象并将其用作参数?

$NM_008866
  A DNAStringSet instance of length 13
     width seq                                                        names               
 [1]   693 ATGTGCGGCAACAACATGTCCGCTCCGA...GATAAGCTCCTACCTCCAATTGATTGA NM_008866
 [2]    72 ATGGATGGGCAGAAGCCTTTGCAGGTAT...AATACATCTGTCCACATGCCCCTGTGA NM_008866
 [3]   114 ATGGGCAGAAGCCTTTGCAGGTATCAAA...GAATATGGCTATGCCTTCTTGGTTTGA NM_008866
 [4]   213 ATGGCATTCCTTCTAACAGGATTATTTT...AGTGCCATGGAGATTGTGACCCTTTAG NM_008866
 [5]    63 ATGTCAAGCACTTCATTGATAAGCTCCT...TTGATTGACATCACTAAGAGGCCTTGA NM_008866
 ...   ... ...
 [9]   219 ATGGCCCTTCTATTGGGAGACCAGGCTT...CAGAGGCAGGCGGATCTCTGTCAATAG NM_008866
[10]   144 ATGTTATGCTTAAAACCAAATACTGTTC...CAGTCTCCTGTACAAATATTAAAATAA NM_008866
[11]    78 ATGTTGCAAAAATTATGGTTATTTCTGA...CCAACCAACCAAGAAGCACCTTTATAA NM_008866
[12]    75 ATGGTTATTTCTGAACGGTTGCTTTTCT...AGAAGCACCTTTATAAACAGGTGCTAA NM_008866
[13]    90 ATGTCTGGATTTAAAACAATTTCAAACA...AATTTACTTCAGTTATTCTATCTGTAA

$NM_001159750
  A DNAStringSet instance of length 9
   width seq                                                         names               
[1]   903 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_001159750
[2]   105 ATGGACCATCAACTGATAAAGACCCTGA...AGAGAAGAAAGTTCCAGCAGCAATGTAA NM_001159750
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_001159750
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_001159750
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_001159750
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_001159750
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_001159750
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_001159750
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_001159750

$NM_011541
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   906 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_011541
[2]   108 ATGGACCATCAACTGATAAAGACCCTGA...GAAGAAAGTAGTTCCAGCAGCAATGTAA NM_011541
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_011541
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_011541
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_011541
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_011541
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_011541
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_011541
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_011541

1 个答案:

答案 0 :(得分:1)

一个非常小的可重复的例子。有趣的是,如果列表中的每个元素都有一个名称(即只返回相同的列表),这将不起作用。确保names(dna_list) <- NULL。我不确定具体原因,也许其他人可能知道并愿意发表评论。

require(Biostrings)
x0 <- DNAStringSet(c("CTCCCAGTAT", "TTCCCGA", "TACCTAGAG"))
x1 <- DNAStringSet(c("AGGTCGT", "GTCAGTGGTCCCC", "CATTTTAGG"))
x2 <- DNAStringSet(c("TGCTAGCTA", "AGTCTTGC", "AGCTTTCGAG"))
dna_list <- list(x0, x1, x2)
> dna_list
[[1]]
  A DNAStringSet instance of length 3
    width seq
[1]    10 CTCCCAGTAT
[2]     7 TTCCCGA
[3]     9 TACCTAGAG

[[2]]
  A DNAStringSet instance of length 3
    width seq
[1]     7 AGGTCGT
[2]    13 GTCAGTGGTCCCC
[3]     9 CATTTTAGG

[[3]]
  A DNAStringSet instance of length 3
    width seq
[1]     9 TGCTAGCTA
[2]     8 AGTCTTGC
[3]    10 AGCTTTCGAG

do.call(c, dna_list)
> do.call(c, dna_list)
  A DNAStringSet instance of length 9
    width seq
[1]    10 CTCCCAGTAT
[2]     7 TTCCCGA
[3]     9 TACCTAGAG
[4]     7 AGGTCGT
[5]    13 GTCAGTGGTCCCC
[6]     9 CATTTTAGG
[7]     9 TGCTAGCTA
[8]     8 AGTCTTGC
[9]    10 AGCTTTCGAG