所以我一直在尝试使用条件只打印一个文件的一部分,但出于某种原因,当我在ipython中运行代码时,它只是不断运行而且永远不会停止。
我正在运行的文件是:
Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.
pilercr v1.06
By Robert C. Edgar
Temp1.None.fasta: 523 putative CRISPR arrays found.
DETAIL REPORT
Array 1
>contig-856000000 902 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ======================================== ======
28 40 95.0 26 TGCTTCCCCG -.....................................T. CTTGGTCTTGCTGGTTCTCACCGACT
94 40 95.0 25 CTCACCGACT .T....................................C. GTCAGCGTGTAGCGACTGTATCTGG
159 40 100.0 CTGTATCTGG ........................................ TTGCTCGAA
========== ====== ====== ====== ========== ========================================
3 40 25 TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG
Array 2
>contig-2277000000 590 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ===================================== ======
19 37 100.0 37 GAGGGTGAGG ..................................... ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG
93 37 100.0 37 TCTGTAATAG ..................................... ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT
167 37 100.0 35 AATATTTGGT ..................................... TTCTCGAACGTTCCATGCTTCATAATATACCTCCT
239 37 100.0 39 TATACCTCCT ..................................... CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA
315 37 100.0 AGCCAGGTAA ..................................... CGTCAGTCATG
========== ====== ====== ====== ========== =====================================
5 37 37 GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC
Array 3
>contig-2766000000 540 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ===================================== ======
172 37 100.0 29 GTTTTAGATG ..................................... TATCGTAGCATCCCACTCCCCTGGTGTAA
238 37 100.0 29 CCTGGTGTAA ..................................... GTTGGACGCGCTGCTGGACGATAGGCTGC
304 37 97.3 29 GATAGGCTGC T.................................... ACGCCTTACAAGCTGACCCGCGCCCAATT
370 37 100.0 GCGCCCAATT ..................................... GTACCTTGTTC
========== ====== ====== ====== ========== =====================================
4 37 29 GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG
SUMMARY BY SIMILARITY
Array Sequence Position Length # Copies Repeat Spacer + Consensus
===== ================ ========== ========== ======== ====== ====== = =========
5 contig-504300000 18 364 6 33 33 + --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT-----
8 contig-974700000 15 229 4 32 33 - --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC-----
12 contig-759000001 464 503 8 33 34 + --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT-----
16 contig-293000000 77 406 6 37 36 - -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC-----
17 contig-457600000 28 416 6 37 38 - -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC-----
18 contig-527300000 1 351 6 33 32 + -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC-----
27 contig-132220000 21 234 4 33 34 + --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT-----
36 contig-602400000 35 304 5 33 34 - --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC-----
38 contig-124860000 131 232 4 32 34 + --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC-----
54 contig-979400000 138 231 4 32 34 - --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC-----
61 contig-992000005 149 693 11 30 36 - --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT-----
68 contig-103110000 37 238 4 34 34 + -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA----
73 contig-372900000 1627 1013 16 30 35 + ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT-----------
到目前为止,我的代码是:
fname = 'crispr_pilrcr_1.out'
start=False
end=False
counter = 0
for line in open(fname, 'r'): # Open up the file
s = line.split() # Split each line into words
if not s: continue # Remove empty lines which would otherwise cause errors
if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings
try:
if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
start=True
print 'Starting'
if s[0] == 'SUMMARY': # Only end once this section has ended
end=True
print 'Ending'
while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences
try:
int(s[0])
print s[7]
except ValueError:
continue
except ValueError:
continue
我认为'while'循环可能有问题但是当我使用'和'代替'或'时,同样的连续运行发生了。
正如我所说,我想在'DETAIL REPORT'和'SUMMARY BY SIMILARITY'之间选择文件的一部分,这就是为什么我设置条件一旦找到就试试。
你们提供的任何帮助都会很棒。
谢谢, 汤姆
答案 0 :(得分:3)
考虑类似
的内容fname = 'crispr_pilrcr_1.out'
counter = 0
printing = False
for line in open(fname, 'r'): # Open up the file
s = line.split() # Split each line into words
if not s: continue # Remove empty lines which would otherwise cause errors
if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings
try:
if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
printing = True
print 'Starting'
elif s[0] == 'SUMMARY': # Only end once this section has ended
printing = False
print 'Ending'
elif printing:
try:
# Anything you put here will only be called for the lines
# between DETAIL... and SUMMARY...
except ValueError:
continue
except ValueError:
continue
基本上,你使用的是一个变量printing
,它被初始化为False,当for循环遇到“DETAIL ...”时设置为True,当for循环遇到“SUMMARY ...时重置为False”。 。“
对于与“DETAIL ...”或“SUMMARY ...”不匹配的行,如果printing
为True(即两个标题之间的行),则{{1} }块将被执行。
答案 1 :(得分:1)
问题是您永远不会更改while循环中start
或end
的值。因此,无论它们具有哪些允许您进入循环的值,每次迭代都是相同的。
如果没有彻底改变你的逻辑,我猜你可能想做类似的事情:
while start or not end:
try:
int(s[0])
print s[7]
except ValueError:
end = True
start = False