我创建了一个我文件的嵌套字典,用于对类中的事件进行分组。我想用关键数字来计算我有多少个类以及有多少个最终值。这是我到目前为止的代码:
infile = open('ALL','r')
def round_down(num):
return num - (num%100)
count = 0
a = []
split_region = {}
lengths = []
for region in infile:
#print region
(cov,chrm,pos,end,leng) = region.split()
start = int(pos)#-1#-int(leng) ## loosen conditions about break points
end = int(end)
lengths = int(leng)
coverage=int(cov)
rounded_start=round_down(start)
rounded_length=round_down(lengths)
if not (chrm in split_region):
split_region[chrm]={}
if not (rounded_start in split_region[chrm]):
split_region[chrm][rounded_start]={}
if not (rounded_length in split_region[chrm][rounded_start]):
split_region[chrm][rounded_start][rounded_length]= []
split_region[chrm][rounded_start][rounded_length].append({'start':start,'length':lengths,'cov':coverage})
for k,v in split_region[chrm][rounded_start].items():
print len(v),k,v
a.append(len(v))
count +=1
print count
print sum(a)
文件的格式如下:
5732 chrM 1 16572 16571
804 chr6 58773612 58780166 6554
722 chr1 142535435 142538993 3558
448 chrY 13447747 13451695 3948
372 chr9 68422753 68423813 1060
327 chr2 133017433 133018716 1283
302 chr18 107858 109884 2026
256 chr20 29638813 29641416 2603
206 chr6 57423087 57429121 6034
204 chr1 142537237 142538991 1754
所以它基本上是通过将数字向下舍入100并在我的字典中创建一个类来实现的。它是嵌套的,因为首先我通过舍入开始然后舍入长度变量。
在代码的最后,我尝试计算有多少类,以及我的值的总数。但是这会输出错误:输入文件中的行数多于类别。有任何想法如何解决这个问题?
答案 0 :(得分:0)
我不清楚你想要的总数,但也许你正在寻找以下之一:
rounded_start_count = 0
rounded_length_count = 0
rounded_length_value_count = 0
for k1, v1 in split_region.items():
print k1 + ": " + str(len(v1))
rounded_start_count += len(v1)
for k2, v2 in v1.items():
rounded_length_count += len(v2)
rounded_length_value_count += len(v2.values())
print ""
print "chrm count: ", len(split_region.keys())
print "Rounded start count: ", rounded_start_count
print "Rounded length count: ", rounded_length_count
print "Rounded length value count: ", rounded_length_count
这将放在你的for循环之后。这将为您的样本数据打印以下输出:
chr6: 2 chr2: 1 chr1: 2 chr9: 1 chrY: 1 chr20: 1 chrM: 1 chr18: 1 chrm count: 8 Rounded start count: 10 Rounded length count: 10 Rounded length value count: 10