编译单独的文件

时间:2017-07-05 14:56:20

标签: python biopython

如果有三个文件:

File1
    >TAIR:175_a
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:175_b
     ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
    >TAIR:175_c
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF

File2
    >TAIR:674_a
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:674_b
     ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA

File3
    >TAIR:812_a
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
    >TAIR:812_c
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA

File4
    >TAIR:975_b
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ

File5
    >TAIR:444_b
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
    >TAIR:444_c
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

我编写了这段代码来提取目录中所有序列的名称:

#!/usr/bin/env python
from Bio import SeqIO
filenames = ["file1","file2","file3"]
ids = []

for record in filenames:
    f = SeqIO.parse(record, 'fasta')
    ids.append(f.id)

print ids

输出是这样的:

 python search_list.py 
[<generator object parse at 0x7f32836018c0>, <generator object parse at 0x7f3283601910>, <generator object parse at 0x7f3283601960>]

我期望的输出是:

file_a
    >TAIR:175_a
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:674_a
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA

file_b
    >TAIR:175_b
     ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
    >TAIR:674_b
     ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:975_b
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
    >TAIR:444_b
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

file_c
    >TAIR:175_c
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:812_c
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:444_c
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

有任何建议要解决这个问题,打开列表“ids”中的文件并编译它们吗?

2 个答案:

答案 0 :(得分:2)

(忽略打印括号问题)您的代码在我的系统上断开(Python 3.6.0; Biopython 1.69),其中包含:

AttributeError: 'generator' object has no attribute 'id'

as SeqIO.parse()返回一个生成器。你的“我期望的输出”也是完全错误的。鉴于此代码,您期望的是:

['TAIR:175_a', 'TAIR:674_a', 'TAIR:812_a', 'TAIR:975_b', 'TAIR:175_b', 'TAIR:444_b', 'TAIR:175_c', 'TAIR:444_c']

在我的环境中,以下代码将为您提供:

from Bio import SeqIO

filenames = ["file1.fasta", "file2.fasta", "file3.fasta"]

ids = []

for filename in filenames:
    records = SeqIO.parse(filename, 'fasta')

    for record in records:
        ids.append(record.id)

print(ids)

答案 1 :(得分:0)

您正在获取该输出,因为您要求python打印一个对象,因此它只是默认打印内存地址而不是内容。 你可能最好只使用标准的python open方法(迭代你要检查的文件列表)。然后,您可以遍历文件中的每一行,并将其添加到列表或您喜欢的任何内容中。如果一个例子有用,请告诉我。