Question

我有两个不同的文件，一个是fasta文件，另一个是带有json的字典生成的txt文件。

file_A看起来像这样;

 > {
      "gene_1005 ['gene description_B']":2,
      "gene_1009 ['gene description_C']":1,
      "gene_104 ['gene description_D']":2,
      "gene_1046 ['gene description_A']":1,

}

file_B如下所示：

gene_1005 ['gen description_B'] ATGTGGATCCGCCCGTTGCAGGCGGAACTGAGCGATAACACGCTGGCTTTGTATGCGCCAAACCGTTTTGTGCTCGA gene_2 ['基因描述_C'] ATGAAATTTACCGTTGAACGTGAACATTTATTAAAACCGCTGCAACAGGTGAGTGGCCCATTAGGTGGCCGCCCAAC

我想要创建的是一个新的fasta文件，只包含file_A中值为2的那些基因。我已经尝试过下面的代码，但我很遗憾。它将打印单词[0]，即基因的名称，但不会打印单词[1]，应该是数字。它发送错误

'超出范围'

import json

def readlines():
    input_file=open('file_A.txt')
    lines=input_file.readlines()
    print lines[1]
    for line in lines:

        words=lines.split(':')

        print words[0]
        print words[1]


    #print line
input_file.close()

readlines方法（）

请问有谁可以帮忙吗？感谢

Answer 1

我认为人们喜欢给予否定而不解释原因或提出建议，这就是这篇文章的建议。但是当我看到否定选民没有打扰一个建议时，我会把答案发给它。

input_file= open('file.fa', 'r')
output_file= open(wanted_genes.fa', 'w')
for line in input_file:
if line[0]=='>':
   geneID=line[1:-1]

  if geneID in my_dict:
    output_file.write(line)
    skip=0
  else:
    skip=1
else:
    if not skip:
    output_file.write(line)
input_file.close()
output_file.close()

从两个不同的文件中提取数据以生成fasta文件

1 个答案: