字典中的错误计数

时间:2014-03-11 12:24:27

标签: python dictionary count

大家好我给我正在使用的文件链接,因为我的代码失败了(但令人惊讶的是在小测试文件上工作)

X50: https://www.dropbox.com/s/woykd2gad8jwpil/simccs_Results

infile中: https://www.dropbox.com/s/5t4hmx120jvnrpk/bwasw_ccs3xtypes

我确实生成了代码:

import os
import collections


def validate(infile):
    def round_down(num):
        return num - (num%100000)



    x50 = open('/home/irek/Desktop/res/errors/simccs_Results','r')

    infile = open(infile, 'r')
    ##    out = open(outfile,'w')


    x50_region = {}
    ct=0
    lengths = []
    for region in x50:
        (name,start,NM,subs,ins,dels,leng) = region.strip().split()

        start = int(start)
        rounded_start=round_down(start)
        lengths = int(leng)
        if not (rounded_start in x50_region):
            x50_region[rounded_start]=[]

        x50_region[rounded_start].append({'name':name,'start':start,'nm':NM,'subs':subs,'ins':ins,'dels':dels,'length':lengths})

    c=0
    countt = 0
    countf = 0
    countfn = 0
    special = collections.Counter()
    rows = {}
    lol = 0
    for line in infile:
        if (c % 1000 == 0):print "c ",c
        c=c+1
        if 'unmapped' in line:
            countfn += 1
        else:# 'amb' in line:
            moleculo_data = line.strip().split()

            name2 = moleculo_data[0]
            start2 = int(moleculo_data[1])
            NM2 = int(moleculo_data[2])
            subs2 = int(moleculo_data[3])
            ins2 = int(moleculo_data[4])
            dels2 = int(moleculo_data[5])
            leng2 = int(moleculo_data[6])
            types2 = moleculo_data[7]
            counts = int(moleculo_data[8])




            rounded_start2 = round_down(start2)
            overlapping='false'
            if rounded_start2 in x50_region: #and name2 in x50_region[rounded_start]:
                for region in x50_region[rounded_start2]:
                    if name2 == region['name']:
                        if start2 >= region['start']-5 and start2 <= region['start']+5:
                            overlapping='true'
            if types2 == 'amb':
                if rounded_start2 in x50_region:
                    for region in x50_region[rounded_start2]:
                        if name2 == region['name']:

                            if start2 >= region['start']-5 and start2 <= region['start']+5:
                                continue
                            else:
                                special[name2]+=1
                                rows[name2] = name2
                                for k,v in special.iteritems():
                                    if int(v) == counts:
                                        countfn +=1
                                        lol +=1
##                                        print k,counts,v,moleculo_data[8],lol
                else:
                    special[name2]+=1
                    rows[name2] = name2
                    for k,v in special.iteritems():
                        if int(v) == counts:
                            countfn +=1
                            lol += 1


            if overlapping == 'true':
                countt +=1
            elif overlapping == 'false':
                countf +=1
                if types2 == 'uniq':
                    countfn += 1
##                    print name2,start2,types2,counts,region['start'],lol


    print countt,countf,countfn,lol


infile = '/home/irek/Desktop/res/types/ccs/bwasw_ccs3xtypes'
validate(infile)

脚本比较两个文件并检查位置是否匹配,具体取决于它计算变量中事件的发生次数:countt,countf,countfn,lol

除了一个参数之外的所有参数都会报告正确的值,但是countfn(或lol值)会返回太大的计数(它不应该大于97018)。

每个变量应该计算的几个字: countt - 如果来自infile的任何位置(具有相同名称)与x50 + = 1匹配 countf - 如果来自infile的任何位置(具有相同名称)与x50 + = 1不匹配 countfn - 如果infile中的位置是uniq + = 1,如果infile中的位置是amb + = 1,当给定名称的位置不匹配时

有人能指出我代码中的错误在哪里吗?

0 个答案:

没有答案