我拥有一个gff文件。在逐行打开文件时,我要获取一些信息。显着地,染色体数目和适当的范围。代码的第一部分工作正常。我遇到麻烦的地方是FASTA的if if then部分无法正常工作。尽管由于位于文件的正确位置而得到True,但我的输出是核苷酸的最后一行,而我的m值是None。我知道这不是我的正则表达式。我认为范围或其他相当明显的问题都存在(我很难看到)。 我已经用Google搜索和stackoverflowed了,但是还没有找到解决方法。任何建议将不胜感激。
def main():
parser = argparse.ArgumentParser( description='Put a description of your script here')
parser.add_argument('-source_gff', '--input_file', type=str, required=True, help='Path to an input file to be read' )
parser.add_argument('-type', '--type', type=str, required=True, help='type to look for' )
parser.add_argument('-attribute', '--attribute', type=str, required=True, help='column 9' )
parser.add_argument('-value', '--value', type=str, required=True, help='column 9' )
args = parser.parse_args()
withinFASTASection = False
key = ''
wholething = list()
#iterate through file
for line in open(args.input_file):
line = line.rstrip()
if line.startswith('##FASTA'):
withinFASTASection = True
#tab delimiter
cols = line.split("\t")
#will skip below code if not 9 columns
if len(cols) != 9:
continue
#find line that meets args requirements
if (cols[2] == args.type
and args.attribute in cols[8]
and args.value in cols[8]):
gene = args.type
geneID = args.attribute
geneValue = args.value
#count to beginning and end of region
start = int(cols[3])
end = int(cols[4])
print(">{}:{}:{}").format(gene, geneID, geneValue)
key = cols[0]
print(withinFASTASection) #False here
#get to the FASTA section
#find line with >key
print(withinFASTASection) #True here
reg = (r'{0}'.format(key))
#correct key & reg
print (key)
print (reg)
if (withinFASTASection):
m = re.search(reg, line)
# m is None
print (m)
if m == key:
#STILL LAST SEQ OF FILE
print (line)
# start adding nts to list line by line until whitespace
# wholething.append(line)
if __name__ == '__main__':
main()
打印行等的当前输出:
gene:ID:YAR003 Correct output
True Correct WithinFASTASection
chrI Correct key
>chrI Correct Reg
None Incorrect m value
我要在文件中查找的示例行:
##FASTA WithinFASTASection = True once this line is read
>chrI Should trigger code if m == key, but m is None
*****编辑 我认为缩进是正确的。如果我缩进打印(withinFASTASection),则仅返回false。我将尝试使用更多缩进。
******潜在线索编辑 当我成功抓取一行时,这是我作为输出获得的最后一行。我也在另一个程序中遇到了这个问题。我似乎无法掌握文件的前几行或某些几行。