使用函数创建一个dict将不起作用,但在函数之外

时间:2015-03-02 13:54:46

标签: python regex dictionary fasta

我遇到了一个我无法找到并修复的问题。

FASTA = >header1
         ATCGATCGATCCCGATCGACATCAGCATCGACTAC
         ATCGACTCAAGCATCAGCTACGACTCGACTGACTACGACTCGCT
        >header2
         ATCGATCGCATCGACTACGACTACGACTACGCTTCGTATCAGCATCAGCT
         ATCAGCATCGACGACGACTAGCACTACGACTACGACGATCCCGATCGATCAGCT

def dnaSequence():
    '''
    This function makes a dict called DNAseq by reading the fasta file 
    given as first argument on the command line
    INPUT: Fasta file containing strings
    OUTPUT: key is header and value is sequence
    '''

    DNAseq = {}
    for line in FASTA:
        line = line.strip()
        if line.startswith('>'):
            header = line
            DNAseq[header] = ""
        else:
            seq = line
            DNAseq[header] = seq

    return DNAseq



def digestFragmentsWithOneEnzyme(dnaSequence):
    '''
    This function digests the sequence from DNAseq into smaller parts
    by using the enzymes listed in the MODES.
    INPUT: DNAseq and the enzymes from sys.argv[2:]
    OUTPUT: The DNAseq is updated with the segments gained from the
    digesting
    '''
    enzymes = sys.argv[2:]

    updated_list = []
    for enzyme in enzymes:
        pattern = MODES(enzyme)
        p = re.compile(pattern)
        for dna in DNAseq.keys():
            matchlist = re.findall(p,dna)
            updated_list = re.split(MODES, DNAseq)
            DNAseq.update((key, updated_list.index(k)) for key in
            d.iterkeys())
    return DNAseq


def getMolecularWeight(dnaSequence):
    '''
    This function calculates the molWeight of the sequence in DNAseq
    INPUT: the updated DNAseq from the previous function as a dict
    OUTPUT: The DNAseq is updated with the molweight of the digested fragments
    '''

    results = []
    for seq in DNAseq.keys():
        results = sum((dnaMass[base]) for base in DNAseq[seq])
        DNAseq.update((key, results.index(k)) for key in
        d.iterkeys())
    return DNAseq


def main(argv=None):
    '''
    This function prints the results of the digested DNA sequence on in the terminal.
    INPUT: The DNAseq from the previous function as a dict
    OUTPUT: name     weight weight weight
            name2    weight weight weight
    '''
    if argv == None:
        argv = sys.argv
    if len(argv) <2:
        usage()
        return 1

    digestFragmentsWithOneEnzyme(dnaSequence())
    Genes = getMolecularWeight(digestFragmentsWithOneEnzyme())
    print ({header},{seq}).format(**DNAseq)
    return 0



if __name__ == '__main__':
    sys.exit(main())

在第一个函数中,我试图从fasta文件中创建dict,在第二个函数中使用相同的dict,其中序列由正则表达式进行切片,最后是{{1正在计算中。

我的问题是,由于某些原因,Python无法识别我的molweight并收到错误:

  

名称错误DNAseq未定义

如果我在功能之外设dict,那么我确实拥有dict

1 个答案:

答案 0 :(得分:1)

您将dict作为dnaSequence传递给两个函数,而不是DNAseq

注意这是一种非常奇怪的调用函数的方法。当你将序列传递给它时,你完全忽略了对digestFragmentsWithOneEnzyme的第一次调用的结果,然后尝试再次调用它以将结果传递给getMolecularWeight,但你实际上无法在该调用中传递序列,如果你走得这么远,那实际上就会出错。

认为你要做的是:

sequence = dnaSequence()
fragments = digestFragmentsWithOneEnzyme(sequence)
genes = getMolecularWeight(fragments)

并且你应该避免将参数调用到两个与单独函数同名的函数,因为这会隐藏函数名。而是选择一个新名称:

def digestFragmentsWithOneEnzyme(sequence):
    ...
    for dna in sequence:

(你不需要调用keys() - 迭代dict总是在键上。)