使用biopython SeqIO从命令行发出处理文件

时间:2016-08-05 14:20:10

标签: python biopython getopt

这是我第一次尝试使用命令行args而不是快速而肮脏sys.argv[]并且写了更多'正确的' python脚本。由于某些原因我现在无法弄清楚,它似乎反对我试图从命令行使用输入文件。

该脚本用于获取输入文件,一些数字索引,然后切出文件的子集区域,但是我不断收到错误,即我给文件的变量I' m传入没有定义:

joehealey@7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
  File "slice_genbank.py", line 70, in <module>
    sub_record = record[start:end]
NameError: name 'record' is not defined

这是代码,我哪里错了? (我确定它很简单):

#!/usr/bin/python

# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.

# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44

# Set up and handle arguments:
from Bio import SeqIO
import getopt


def main(argv):
    record = ''
    start = ''
    end = ''
    try:
        opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
                                                   'help',
                                                   'input=',
                                                   'outfile=',
                                                   'start=',
                                                   'end='
                                                   ]
                              )
        if not opts:
            print "No options supplied. Aborting."
            usage()
            sys.exit(2)
    except getopt.GetoptError:
        print "Some issue with commandline args.\n"
        usage()
        sys.exit(2)

    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit(2)
        elif opt in ("-i", "--input"):
            filename = arg
            record = SeqIO.read(arg, "genbank")
        elif opt in ("-o", "--outfile"):
            outfile = arg
        elif opt in ("-s", "--start"):
            start = arg
        elif opt in ("-e", "--end"):
            end = arg
    print("Slicing " + filename + " from " + str(start) + " to " + str(end))

def usage():
    print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.

Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"

Options:

-h|--help       Displays this usage message. No options will also do this.
-i|--input      The genbank file you which to subset a record from.
-o|--outfile    The file name you wish to give to the new sliced genbank.
-s|--start      An integer base index to slice the record from.
-e|--end        An integer base index to slice the record to.
"""
      )

#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")

if __name__ == "__main__":
 main(sys.argv[1:])

SeqIO.write语法也存在问题,但我还没有达到这个目的。

编辑:

还忘了提到当我使用`record = SeqIO.read(&#34; file.gbk&#34;,&#34; genbank&#34;)并将文件名直接写入脚本时,它工作正常。

1 个答案:

答案 0 :(得分:1)

正如评论中所述,您的变量records仅在方法main()中定义(对于startend也是如此),因此它不是对于该计划的其余部分可见。 您可以像这样返回值:

def main(argv):
    ...
    ...
    return record, start, end

您对main()的通话可能如下所示:

record, start, end = main(sys.argv[1:])

或者,您可以将主要功能移动到main功能中(就像您一样)。

(另一种方法是在主程序中定义变量并在函数中使用global关键字,但不建议这样做。)