如果有三个文件:
File1
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
File2
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
File3
>TAIR:812_a
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
File4
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
File5
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
我编写了这段代码来提取目录中所有序列的名称:
#!/usr/bin/env python
from Bio import SeqIO
filenames = ["file1","file2","file3"]
ids = []
for record in filenames:
f = SeqIO.parse(record, 'fasta')
ids.append(f.id)
print ids
输出是这样的:
python search_list.py
[<generator object parse at 0x7f32836018c0>, <generator object parse at 0x7f3283601910>, <generator object parse at 0x7f3283601960>]
我期望的输出是:
file_a
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
file_b
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
file_c
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
有任何建议要解决这个问题,打开列表“ids”中的文件并编译它们吗?
答案 0 :(得分:2)
(忽略打印括号问题)您的代码在我的系统上断开(Python 3.6.0; Biopython 1.69),其中包含:
AttributeError: 'generator' object has no attribute 'id'
as SeqIO.parse()
返回一个生成器。你的“我期望的输出”也是完全错误的。鉴于此代码,您期望的是:
['TAIR:175_a', 'TAIR:674_a', 'TAIR:812_a', 'TAIR:975_b', 'TAIR:175_b', 'TAIR:444_b', 'TAIR:175_c', 'TAIR:444_c']
在我的环境中,以下代码将为您提供:
from Bio import SeqIO
filenames = ["file1.fasta", "file2.fasta", "file3.fasta"]
ids = []
for filename in filenames:
records = SeqIO.parse(filename, 'fasta')
for record in records:
ids.append(record.id)
print(ids)
答案 1 :(得分:0)
您正在获取该输出,因为您要求python打印一个对象,因此它只是默认打印内存地址而不是内容。 你可能最好只使用标准的python open方法(迭代你要检查的文件列表)。然后,您可以遍历文件中的每一行,并将其添加到列表或您喜欢的任何内容中。如果一个例子有用,请告诉我。