我想解决我的问题,即: 当我的生产线符合条件时,打印出从该行开始的所有行,直到此行+值
我有一个代码如下:
import re
##
def round_down(num):
return num - (num%100000) ###reduce search space
##
##
##def Filter(infile, outfile):
##out = open(outfile,'w')
infile = open('AT_rich','r')
cov = open('30x_good_ok_bad_0COV','r') ###File with non platinum regions
#platinum_region = [row for row in Pt]
platinum_region={} ### create dictionary for non platinum regions. Works fast
platinum_region['chrM']={}
platinum_region['chrM'][0]=[]
ct=0
for region in infile:
(chr,start,end,types,length)= region.strip().split()
start=int(start)
end=int(end)
length = int(length)
rounded_start=round_down(start)
##
if not (chr in platinum_region):
platinum_region[chr]={}
if not (rounded_start in platinum_region[chr]):
platinum_region[chr][rounded_start]=[]
platinum_region[chr][rounded_start].append({'start':start,'end':end,'length':length})
##
##c=0
for vcf_line in cov: ###process file with indels
## if (c % 1000 ==0):print "c ",c
## c=c+1
vcf_data = vcf_line.strip().split()
vcf_chrom=vcf_data[0]
vcf_pos=int(vcf_data[1])
vcf_end=int(vcf_data[2])
coverage = int(vcf_data[3])
rounded_vcf_position=round_down(vcf_pos) ###round positions to reduce search space
## print vcf_chrom
## for vcf_line in infile: ###process file with indels
## if (c % 1000 ==0):print "c ",c
overlapping = 'false'
if vcf_chrom in platinum_region and rounded_vcf_position in platinum_region[vcf_chrom]:
for region in platinum_region[vcf_chrom][rounded_vcf_position]:
if (vcf_pos == region['start']):# and vcf_end == region['end']):# and (vcf_end > region['start'] and vcf_end < region['end']):
if vcf_chrom != 'chrX' and vcf_chrom != 'chrY':
print vcf_data
文件只是间隔开始结束,第一列[0]包含染色体ex.'chr1':
COV:
chr1 1 3 AT_rich 3
chr1 5 8 AT_rich 4
chr1 10 12 AT_rich 3
最后一列是区域['length']
infile中:
chr1 1 2 4247
chr1 2 3 4244
chr1 3 5 4224
chr1 5 7 4251
chr1 7 8 4251
chr1 8 12 4254
chr1 12 15 4253
输出将是:
chr1 1 2 4247
chr1 2 3 4244
chr1 5 7 4251
chr1 7 8 4251
chr1 8 12 4254## here there isn't really start-start matching position, but there is an overlap between two files
chr1 12 15 4253
所以主要的想法是,如果来自一个文件(cov)的区域从第二个文件(infile)开始于该区域的位置。打印从此匹配开始位置开始直到从第一个文件(cov)开始的区域长度的所有位置。有时候没有完全匹配的位置,只有一些重叠,所以在这种情况下我们可能不关心那些(即使在输出中也有它们会很好)
我想打印从vcf_data开始的行(满足条件时)直到vcf_data + region ['length']。将此添加到我的代码的方法是什么?
答案 0 :(得分:1)
将此条件添加到循环中:
if region_count > 0:
region_count -= 1
print line
循环之前:
region_count = 0
在“满足条件”之内,但在上述新条件之前:
region_count = region['length']
答案 1 :(得分:1)
我不太了解您的输入和输出格式,但根据您的描述,我猜您可以这样做:
lines = string.split('\n') # Put the content into array of lines
for idx, line in enumerate(lines): # Iterate over the lines, with the index
if condition(line): # If the line fulfill a condition
print lines[idx:idx+length] # Print the line range