将fasta文件转换为包含名称和序列的列表

时间:2017-12-09 13:09:41

标签: python list dictionary fasta

目标:返回名称和序列列表

def ReadFastaFile(filename):
  fileObj = open(filename, 'r')
  sequences = []
  seqFragments = []
  for line in fileObj:
    if line.startswith('>'):
      if seqFragments:
        sequence = ''.join(seqFragments)
        sequences.append(sequence)
      seqFragments = []
    else:
      seq = line.rstrip()
      seqFragments.append(seq)
  if seqFragments:
    sequence = ''.join(seqFragments)
    sequences.append(sequence)
  fileObj.close()
  return sequences

我想获得名称和顺序的列表 这段代码给了我一个只有序列的列表,因为我首先想到我不需要我想做的名字。但现在我意识到包括这些名字会更好。也许如果可能的话也可以用字典形式,这样就像:dict = {'name':sequence}。有人知道如何改变代码来实现这个目标吗?

1 个答案:

答案 0 :(得分:0)

这应该非常简单:

def ReadFastaFile(filename):
  fileObj = open(filename, 'r')
  sequences = dict()
  seqFragments = []
  for line in fileObj:
    if line.startswith('>'):
      if seqFragments:
        sequence = ''.join(seqFragments)
        sequences[id] = sequence
      seqFragments = []
      id = line.rstrip()[1:]
    else:
      seq = line.rstrip()
      seqFragments.append(seq)
  if seqFragments:
    sequence = ''.join(seqFragments)
    sequences[id] = sequence
  fileObj.close()
  return sequences