试图打印前20行的PSL文件

时间:2014-05-23 21:40:05

标签: python bioinformatics

我有这个代码读取PSL文件。为了清晰起见,我已经粘贴了下面的全部内容,但我现在真正关注的是readPSLpairs方法(它位于最底层)。现在,我只是想让那个方法打印psl文件的前20行,但由于某种原因它不是......我没有收到错误信息或任何东西 - 但程序只是空白。我错过了什么?谢谢。

import sys
class PSLreader :
    '''
    Class to provide reading of a file containing psl alignments
    formatted sequences:
    object instantiation:
    myPSLreader = PSLreader(<file name>):

    object attributes:
    fname: the initial file name

    methods:
    readPSL() : reads psl file, yielding those alignments that are within the first or last
                1000 nt

    readPSLpairs() : yields psl pairs that support a circular hypothesis 

    Author: David Bernick
    Date: May 12, 2013
    '''

    def __init__ (self, fname='EEV14-Cb.filtered.psl'):
        '''contructor: saves attribute fname '''

        self.fname = fname

    def doOpen (self):
        if self.fname is '':
            return sys.stdin
        else:
            return open(self.fname)

    def readPSL (self):
        '''
        using filename given in init, returns each filtered psl records
        that contain alignments that are within the terminal 1000nt of
        the target. Incomplete psl records are discarded.
        If filename was not provided, stdin is used.

        This method selects for alignments that could may be part of a
        circle.

        Illumina pairs aligned to the top strand would have read1(+) and read2(-).
        For the bottoms trand, read1(-) and read2(+).

        For potential circularity,
        these are the conditions that can support circularity:
        read1(+) near the 3' terminus
        read1(-) near the 5' terminus
        read2(-) near the 5' terminus
        read2(+) near the 3' terminus

        so...
        any read(+) near the 3', or
        any read(-) near the 5'

        '''

        nearEnd = 1000   # this constant determines "near the end"
        with self.doOpen() as fileH:

            for line in fileH:
                pslList = line.split()
                if len(pslList) < 17:
                    continue
                tSize = int(pslList[14])
                tStart = int(pslList[15])
                strand = str(pslList[8])

                if strand.startswith('+') and (tSize - tStart > nearEnd):
                    continue
                elif strand.startswith('-') and (tStart > nearEnd):
                    continue

                yield line

    def readPSLpairs (self):
        i = 0
        for psl in self.readPSL():
            if i>20:
                print(psl.split())
                i+=1

编辑:所以我试着像这样回到主要部门:

def main():
    new_psl = PSLreader(fname)
    new_psl.readPSLpairs()#creating class objects
    new_psl.output()

main()

但这仍然没有使代码工作......错误说明,“NameError:全局名称'fname'未定义”

1 个答案:

答案 0 :(得分:0)

首先,i>20函数中的readPSLpairs检查表示程序只会在第20行之后打印行。如果您的文件只有20行(或者少于20行符合readPSL中的约束,那么这可以解释您的问题。我会检查有多少行通过readPSL中的检查( if condition: continue)形式的那些;尝试在readPSL中添加一些打印语句。

此外,如果这是您的整个程序,则您没有任何实际运行此方法的main方法;确保您的代码实际上正在运行!