python:为什么我在间隔内计算频率时会得到不同的结果?

时间:2017-03-08 07:58:14

标签: python-2.7 intervals

以下代码是一个程序,旨在计算大数据集中不等长度间隔内的频率。两个列表“snp”和“bin_list”是测试数据。我必须按照以下节目编写我的代码。

我遇到的问题是,在代码中使用“continue”和“snp.remove(site)”时结果会有所不同。

在代码中使用“continue”时,我得到以下结果:

Potri.001G000300up1k 26
Potri.001G000400down1k 26
Potri.001G000300part2 5

但是,在代码中使用“snp.remove(site)”时,我得到了不同的结果:

Potri.001G000300up1k 26
Potri.001G000400down1k 25
Potri.001G000300part2 5

实际上,第一个结果是正确的低速,而第二个结果有点错误的高速。

所以,我的问题是如何在代码中使用“snp.remove(site)”时修复错误?

我使用的是python 2.7.12。

注意:我必须在每个循环中迭代“snp”列表。

#!/usr/bin/env python

def locateBin(Start, End, site):
    return site >= Start and site <= End

snp = ['17', '24', '31', '36', '38', '43', '45', '50', '52', '58', '86', '224', '306', '369', '663', '665', '668', '740', '811', '844', '891', '942', '1059', '1097', '1186', '1371', '1437', '1458', '1487', '1537', '1571', '1720', '1853', '2066', '2238', '2292', '2296', '2332', '2367', '2387', '2483', '2585', '2772', '2856', '2935', '2944', '2966', '2967', '2991', '2992', '3048', '3166', '3211', '3241', '3280', '3350', '3351', '3367', '3373', '3378', '3406', '3449', '3454', '3533', '3573', '3621', '3623', '3643', '3644', '3697', '3745', '3757', '3822', '3867', '3893', '3949', '4094', '4142', '4149', '4260', '4457', '4462', '4511', '4528', '4535', '4622', '4719', '4722', '4775', '4790', '4801', '4863', '4873', '4879', '4928', '5044', '5454', '5498', '5557', '5584', '5805', '6215', '6231', '6243', '6293', '6346', '6365', '6401', '6421', '6616', '6812', '6861', '6925', '7023', '7126', '7341', '7342', '7369', '7412', '7413', '7483', '7501', '7645', '7679', '7681', '7799', '7828', '7896', '7928', '7944', '7950', '7971', '8002', '8003', '8038', '8058', '8092', '8134', '8213', '8224', '8275', '8292', '8323', '8378', '8444', '8481', '8498', '8499', '8504', '8556', '8616', '8660', '8676', '8710', '8773', '8817', '9158', '9228', '9232', '9302', '9321', '9340', '9383', '9429', '9538', '9602', '9691', '9723', '9880', '9914', '10044', '10046', '10068', '10073', '10176', '10192', '10237', '10241', '10300', '10368', '10618', '10742', '10835', '10959', '11025', '11028', '11260', '11275', '11528', '11912', '11986', '12062', '12095', '12347', '12366', '12513', '12560', '12592', '12648']

bin_list = [['Potri.001G000300up1k', 'Chr01', '7391', '8391'], ['Potri.001G000400down1k', 'Chr01', '7391', '8391'], ['Potri.001G000300part2', 'Chr01',  '8625', '8860']]


index = 0
count_list = []

while index < len(bin_list):
    num = 0
    el = bin_list[index]
    for site in snp:
        if int(site) < int(el[2]):
            continue
            #snp.remove(site)
        elif locateBin(int(el[2]), int(el[3]), int(site)):
            num += 1
        else:
            count_list.append([el[0], num]) 
            break
    index += 1

for line in count_list:
    print("%s\t%s\n" % (line[0], line[1])),

1 个答案:

答案 0 :(得分:2)

通常不应在迭代时修改列表。一个简单的解决方法是复制它以进行迭代(for site in snp[:]:

snp[:]创建列表的副本。