大家好我给我正在使用的文件链接,因为我的代码失败了(但令人惊讶的是在小测试文件上工作)
X50: https://www.dropbox.com/s/woykd2gad8jwpil/simccs_Results
infile中: https://www.dropbox.com/s/5t4hmx120jvnrpk/bwasw_ccs3xtypes
我确实生成了代码:
import os
import collections
def validate(infile):
def round_down(num):
return num - (num%100000)
x50 = open('/home/irek/Desktop/res/errors/simccs_Results','r')
infile = open(infile, 'r')
## out = open(outfile,'w')
x50_region = {}
ct=0
lengths = []
for region in x50:
(name,start,NM,subs,ins,dels,leng) = region.strip().split()
start = int(start)
rounded_start=round_down(start)
lengths = int(leng)
if not (rounded_start in x50_region):
x50_region[rounded_start]=[]
x50_region[rounded_start].append({'name':name,'start':start,'nm':NM,'subs':subs,'ins':ins,'dels':dels,'length':lengths})
c=0
countt = 0
countf = 0
countfn = 0
special = collections.Counter()
rows = {}
lol = 0
for line in infile:
if (c % 1000 == 0):print "c ",c
c=c+1
if 'unmapped' in line:
countfn += 1
else:# 'amb' in line:
moleculo_data = line.strip().split()
name2 = moleculo_data[0]
start2 = int(moleculo_data[1])
NM2 = int(moleculo_data[2])
subs2 = int(moleculo_data[3])
ins2 = int(moleculo_data[4])
dels2 = int(moleculo_data[5])
leng2 = int(moleculo_data[6])
types2 = moleculo_data[7]
counts = int(moleculo_data[8])
rounded_start2 = round_down(start2)
overlapping='false'
if rounded_start2 in x50_region: #and name2 in x50_region[rounded_start]:
for region in x50_region[rounded_start2]:
if name2 == region['name']:
if start2 >= region['start']-5 and start2 <= region['start']+5:
overlapping='true'
if types2 == 'amb':
if rounded_start2 in x50_region:
for region in x50_region[rounded_start2]:
if name2 == region['name']:
if start2 >= region['start']-5 and start2 <= region['start']+5:
continue
else:
special[name2]+=1
rows[name2] = name2
for k,v in special.iteritems():
if int(v) == counts:
countfn +=1
lol +=1
## print k,counts,v,moleculo_data[8],lol
else:
special[name2]+=1
rows[name2] = name2
for k,v in special.iteritems():
if int(v) == counts:
countfn +=1
lol += 1
if overlapping == 'true':
countt +=1
elif overlapping == 'false':
countf +=1
if types2 == 'uniq':
countfn += 1
## print name2,start2,types2,counts,region['start'],lol
print countt,countf,countfn,lol
infile = '/home/irek/Desktop/res/types/ccs/bwasw_ccs3xtypes'
validate(infile)
脚本比较两个文件并检查位置是否匹配,具体取决于它计算变量中事件的发生次数:countt,countf,countfn,lol
除了一个参数之外的所有参数都会报告正确的值,但是countfn(或lol值)会返回太大的计数(它不应该大于97018)。
每个变量应该计算的几个字: countt - 如果来自infile的任何位置(具有相同名称)与x50 + = 1匹配 countf - 如果来自infile的任何位置(具有相同名称)与x50 + = 1不匹配 countfn - 如果infile中的位置是uniq + = 1,如果infile中的位置是amb + = 1,当给定名称的位置不匹配时
有人能指出我代码中的错误在哪里吗?