Python,gff文件,FASTA部分,绕过正则表达式

时间:2019-05-07 08:42:23

标签: python regex fasta gff

我拥有一个gff文件。在逐行打开文件时,我要获取一些信息。显着地,染色体数目和适当的范围。代码的第一部分工作正常。我遇到麻烦的地方是FASTA的if if then部分无法正常工作。尽管由于位于文件的正确位置而得到True,但我的输出是核苷酸的最后一行,而我的m值是None。我知道这不是我的正则表达式。我认为范围或其他相当明显的问题都存在(我很难看到)。 我已经用Google搜索和stackoverflowed了,但是还没有找到解决方法。任何建议将不胜感激。

      def main():
          parser = argparse.ArgumentParser( description='Put a description of your script here')
          parser.add_argument('-source_gff', '--input_file', type=str, required=True, help='Path to an input file to be read' )
          parser.add_argument('-type', '--type', type=str, required=True, help='type to look for' )
          parser.add_argument('-attribute', '--attribute', type=str, required=True, help='column 9' )
          parser.add_argument('-value', '--value', type=str, required=True, help='column 9' )

          args = parser.parse_args()
          withinFASTASection = False    
          key = ''
          wholething = list()

          #iterate through file
          for line in open(args.input_file):
              line = line.rstrip()

              if line.startswith('##FASTA'):
                  withinFASTASection = True

              #tab delimiter
              cols = line.split("\t")

              #will skip below code if not 9 columns
              if len(cols) != 9:
                 continue       

              #find line that meets args requirements
              if (cols[2] == args.type
                and args.attribute in cols[8] 
                and args.value in cols[8]):

                gene = args.type
                geneID = args.attribute
                geneValue = args.value

                #count to beginning and end of region
                start = int(cols[3])
                end = int(cols[4])

                print(">{}:{}:{}").format(gene, geneID, geneValue)
                key = cols[0]
                print(withinFASTASection)    #False here

        #get to the FASTA section
        #find line with >key

    print(withinFASTASection)    #True here
    reg = (r'{0}'.format(key))

    #correct key & reg
    print (key)
    print (reg)

    if (withinFASTASection):
        m = re.search(reg, line)
        # m is None
        print (m)
        if m == key:
            #STILL LAST SEQ OF FILE
            print (line)

          # start adding nts to list line by line until whitespace
          # wholething.append(line)

if __name__ == '__main__':
    main()

打印行等的当前输出:

gene:ID:YAR003    Correct output
True              Correct WithinFASTASection  
chrI              Correct key    
>chrI             Correct Reg  
None              Incorrect m value

我要在文件中查找的示例行:

##FASTA           WithinFASTASection = True once this line is read
>chrI             Should trigger code if m == key, but m is None

*****编辑     我认为缩进是正确的。如果我缩进打印(withinFASTASection),则仅返回false。我将尝试使用更多缩进。

******潜在线索编辑     当我成功抓取一行时,这是我作为输出获得的最后一行。我也在另一个程序中遇到了这个问题。我似乎无法掌握文件的前几行或某些几行。

0 个答案:

没有答案