如何在python中只使用部分文件?

时间:2014-09-26 14:13:36

标签: python python-2.7

所以我一直在尝试使用条件只打印一个文件的一部分,但出于某种原因,当我在ipython中运行代码时,它只是不断运行而且永远不会停止。

我正在运行的文件是:

Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.

pilercr v1.06
By Robert C. Edgar

Temp1.None.fasta: 523 putative CRISPR arrays found.



DETAIL REPORT



Array 1
>contig-856000000 902 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                      Spacer
==========  ======  ======  ======  ==========    ========================================    ======
        28      40    95.0      26  TGCTTCCCCG    -.....................................T.    CTTGGTCTTGCTGGTTCTCACCGACT
        94      40    95.0      25  CTCACCGACT    .T....................................C.    GTCAGCGTGTAGCGACTGTATCTGG
       159      40   100.0          CTGTATCTGG    ........................................    TTGCTCGAA
==========  ======  ======  ======  ==========    ========================================
         3      40              25                TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG


Array 2
>contig-2277000000 590 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
        19      37   100.0      37  GAGGGTGAGG    .....................................    ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG
        93      37   100.0      37  TCTGTAATAG    .....................................    ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT
       167      37   100.0      35  AATATTTGGT    .....................................    TTCTCGAACGTTCCATGCTTCATAATATACCTCCT
       239      37   100.0      39  TATACCTCCT    .....................................    CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA
       315      37   100.0          AGCCAGGTAA    .....................................    CGTCAGTCATG
==========  ======  ======  ======  ==========    =====================================
         5      37              37                GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC


Array 3
>contig-2766000000 540 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
       172      37   100.0      29  GTTTTAGATG    .....................................    TATCGTAGCATCCCACTCCCCTGGTGTAA
       238      37   100.0      29  CCTGGTGTAA    .....................................    GTTGGACGCGCTGCTGGACGATAGGCTGC
       304      37    97.3      29  GATAGGCTGC    T....................................    ACGCCTTACAAGCTGACCCGCGCCCAATT
       370      37   100.0          GCGCCCAATT    .....................................    GTACCTTGTTC
==========  ======  ======  ======  ==========    =====================================
         4      37              29                GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG


SUMMARY BY SIMILARITY



Array          Sequence    Position      Length  # Copies  Repeat  Spacer  +  Consensus
=====  ================  ==========  ==========  ========  ======  ======  =  =========
    5  contig-504300000          18         364         6      33      33  +  --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT-----
    8  contig-974700000          15         229         4      32      33  -  --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC-----
   12  contig-759000001         464         503         8      33      34  +  --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT-----
   16  contig-293000000          77         406         6      37      36  -  -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC-----
   17  contig-457600000          28         416         6      37      38  -  -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC-----
   18  contig-527300000           1         351         6      33      32  +  -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC-----
   27  contig-132220000          21         234         4      33      34  +  --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT-----
   36  contig-602400000          35         304         5      33      34  -  --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC-----
   38  contig-124860000         131         232         4      32      34  +  --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC-----
   54  contig-979400000         138         231         4      32      34  -  --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC-----
   61  contig-992000005         149         693        11      30      36  -  --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT-----
   68  contig-103110000          37         238         4      34      34  +  -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA----
   73  contig-372900000        1627        1013        16      30      35  +  ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT-----------

到目前为止,我的代码是:

fname = 'crispr_pilrcr_1.out'
start=False
end=False
counter = 0
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            start=True
            print 'Starting'
        if s[0] == 'SUMMARY': # Only end once this section has ended
            end=True
            print 'Ending'
        while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences 
            try:
                int(s[0])
                print s[7]
            except ValueError:
                continue
    except ValueError:
        continue

我认为'while'循环可能有问题但是当我使用'和'代替'或'时,同样的连续运行发生了。

正如我所说,我想在'DETAIL REPORT'和'SUMMARY BY SIMILARITY'之间选择文件的一部分,这就是为什么我设置条件一旦找到就试试。

你们提供的任何帮助都会很棒。

谢谢, 汤姆

2 个答案:

答案 0 :(得分:3)

考虑类似

的内容
fname = 'crispr_pilrcr_1.out'
counter = 0
printing = False
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            printing = True
            print 'Starting'
        elif s[0] == 'SUMMARY': # Only end once this section has ended
            printing = False
            print 'Ending'
        elif printing:
            try:
                # Anything you put here will only be called for the lines
                #   between DETAIL... and SUMMARY...
            except ValueError:
                continue
    except ValueError:
        continue

基本上,你使用的是一个变量printing,它被初始化为False,当for循环遇到“DETAIL ...”时设置为True,当for循环遇到“SUMMARY ...时重置为False”。 。“

对于与“DETAIL ...”或“SUMMARY ...”不匹配的行,如果printing为True(即两个标题之间的行),则{{1} }块将被执行。

答案 1 :(得分:1)

问题是您永远不会更改while循环中startend的值。因此,无论它们具有哪些允许您进入循环的值,每次迭代都是相同的。

如果没有彻底改变你的逻辑,我猜你可能想做类似的事情:

while start or not end:
    try:
        int(s[0])
        print s[7]
    except ValueError:
        end = True
        start = False