我有2个文件:一个是包含一系列ID的文本文件,另一个是multifasta文件,其中包含对应于第一个文件中ID的fasta序列。我有一个python脚本,可以从两个文件中选择匹配的ID。看起来像这样:
type
我需要编辑我的脚本,以便它可以选择相应ID旁边的序列文本。你能帮我吗?谢谢。
答案 0 :(得分:0)
如果您只想从names
文件中提取项目,则先将名称读取到内存中可能会更有效。
from Bio import SeqIO
wanted = dict()
with open("names","r") as lines:
for line in lines:
wanted[line.strip()] = 1
for record in SeqIO.parse("fasta1.fasta","fasta"):
if record.id in wanted:
print(record.seq)
答案 1 :(得分:0)
看看是否可行:
from Bio import SeqIO
fasta=SeqIO.parse("fasta1.fasta","fasta")
seq_dict = {}
for record in fasta:
seq_dict[record.id.strip()] = record.seq
with open("names","r") as lines:
for line in lines:
l = line.strip().lstrip('<')
if l in seq_dict:
print(l) # ID
print(seq_dict[l]) # sequence
请注意,这假定从fasta文件获得的ID与名称文件中的ID相同。如果不是这种情况,请提供两个文件分别包含的内容的更多详细信息(带有示例)
答案 2 :(得分:0)
在与Bio.SeqIO玩了一会后,我得出结论@Bazingaa可能是正确的。像这样修改代码:
from Bio import SeqIO
fasta=SeqIO.parse("fasta1.fasta","fasta")
seq_dict={}
for record in fasta:
seq_dict[record.id]=record.description
#print (seq_dict)
for line in open("names","r"):
line=line.strip()
print(line)
for cle, desc in seq_dict.items():
print(cle)
print(desc)
您似乎是python新手,所以这是我做的:
for a, b in <some dictionary>.items()
将遍历返回键,值对的字典项到a,b变量中希望这会有所帮助。
编辑:
这是一个更加“ pythonic”的版本。我不太了解fasta是什么,所以我假设您想从名称中读取行,将'tr | something'something部分作为id(不带前导'>')并打印出来自'fasta1 .fasta”(如果它们是名字:
from Bio import SeqIO
fasta = SeqIO.parse("fasta1.fasta","fasta")
# read all the names
with open("names", "r") as f: # this takes care to close the file afterwards
names = [line.strip().lstrip('>') for line in f]
print("Names: ", names)
for record in fasta:
print("Record:", record.id)
if record.id in names:
print("Matching record:", record.id, record.seq, record.description)