Question

对于我正在进行基准测试的算法，我需要测试一个列表的某些部分（可能很长，但主要是填充0，偶尔会填充1）。这个想法是在n个项目的列表中，其中d个是感兴趣的，期望每个项目都有概率d / n的缺陷。因此，检查一组大小d / n（由于信息理论原因，它根据楼层和日志函数定义 - 它使算法的分析更容易）。

算法：

1. /如果n <= 2 * d -2（即超过一半的列表填充1s），只需依次查看每个项目

2. /如果n> 2 * d -2：检查一组大小aplha（= floor（binarylog（l / d），l = n - d + 1，d = 1s的数量。）如果有1，则对组执行二进制搜索找到有缺陷并设置d = d - 1和n = n - 1 - x（x =组的大小减去有缺陷的）。如果没有，则设置n = n - groupSize并转到1（即检查列表的其余部分。）

但是，当在随机位置使用10个1填充列表时，算法会找到除1之外的所有列表，然后在检查空列表时继续循环。

我认为问题是当丢弃一个包含全0的组时，我没有正确地修改引用下一轮开始的引用，这导致我的算法失败。

以下是该功能的相关部分：

import math

def binary_search(inList):
    low = 0
    high = len(inList)

    while low < high:
        mid = (low + high) // 2
        upper = inList[mid:high]
        lower = inList[low:mid]
        if any(lower):
            high = mid
        elif any(upper):
            low = mid + 1
        elif mid == 1:
            return mid
        else:
            # Neither side has a 1
            return -1

    return mid

def HGBSA(inList, num_defectives):

n = len(inList)
defectives = []

#initialising the start of the group to be tested        
start = 0    

while num_defectives > 0:
    defective = 0
    if(n <= (2*num_defectives - 2)):
        for i in inList:
            if i == 1:
                num_defectives = num_defectives - 1
                n = n - 1
                defectives.append(i)
    else:
        #params to determine size of group
        l = n - num_defectives + 1
        alpha = int(math.floor(math.log(l/num_defectives, 2)))
        groupSize = 2**alpha
        end = start + groupSize
        group = inList[start:end]
        #print(groupSize)
        #print(group)
        if any(group): 
            defective = binary_search(group)
            defective = start + defective 
            defectives.append(defective)
            undefectives = [s for s in group if s != 1]
            n = n - 1 - len(undefectives)
            num_defectives = num_defectives - 1
            print(defectives)
        else:
            n = n - groupSize

        start = start + groupSize    

print(defectives)
return defectives

此处还有函数当前通过的测试：

from GroupTesting import HGBSA

#idenitify a single defective
inlist = [0]*1024
inlist[123] = 1
assert HGBSA(inlist, 1) == [123]

#identify two defectives
inlist = [0]*1024
inlist[123] = 1
inlist[789] = 1
assert inlist[123] == 1
assert inlist[789] == 1
assert HGBSA(inlist, 2) == [123, 789]

zeros = [0]*1024
ones = [1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
for val in ones:
    zeros[val] = 1
assert HGBSA(zeros, 10) == ones

即。它找到了确定性地放在列表中的单个1,2和10 1，但是这个测试：

zeros = [0] * 1024
ones = [1] * 10
l =  zeros + ones
shuffle(l)
where_the_ones_are = [i for i, x in enumerate(l) if x == 1] 
assert HGBSA(l, 10) == where_the_ones_are

暴露了这个bug。

此测试也失败了上面的代码

#identify two defectives next to each other
inlist = [0]*1024
inlist[123] = 1
inlist[124] = 1
assert GT(inlist, 2) == [123, 124]

以下修改（如果整组失败，但只丢弃有缺陷之前的成员）则通过“彼此相邻的两个”测试，但不是“连续10”或随机测试：

def HGBSA(inList, num_defectives):

n = len(inList)
defectives = []

#initialising the start of the group to be tested        
start = 0    

while num_defectives > 0:
    defective = 0
    if(n <= (2*num_defectives - 2)):
        for i in inList:
            if i == 1:
                num_defectives = num_defectives - 1
                n = n - 1
                defectives.append(i)
    else:
        #params to determine size of group
        l = n - num_defectives + 1
        alpha = int(math.floor(math.log(l/num_defectives, 2)))
        groupSize = 2**alpha
        end = start + groupSize
        group = inList[start:end]
        #print(groupSize)
        #print(group)
        if any(group): 
            defective = binary_search(group)
            defective = start + defective 
            defectives.append(defective)
            undefectives = [s for s in group if s != 1 in range(0, groupSize//2)]
            print(len(undefectives))
            n = n - 1 - len(undefectives)
            num_defectives = num_defectives - 1
            start = start + defective + 1
            #print(defectives)
        else:
            n = n - groupSize
            start = start + groupSize  

print(defectives)
return defectives

即。问题是当一个组中有多个1被测试时，并且在第一个没有被检测到之后。传递代码的最佳测试是在整个列表中随机均匀分布的1s，并找到所有缺陷。

另外，我将来如何创建测试以捕获此类错误？

Answer 1

您的算法似乎比线性扫描的性能更差。

一个天真的算法只会扫描O（d / n）中d / n大小的列表。

defectives = [index for (index, element) in enumerate(inList[start:end], start)]

常识说你不可能在没有查看列表中的每个元素的情况下检测列表中所有1的位置，并且没有必要再查看一次。

您的“二分搜索”多次使用any，有效地多次扫描列表的各个部分。同样适用于像if any(group): ... [s for s in group if ...]这样的结构，它首次扫描group两次，不必要。

如果您描述了您尝试实施的实际算法，那么人们可以帮助解决问题。从你的代码和你的帖子，算法不清楚。不幸的是，您的HGBSA函数很长并且没有完全评论这一事实无助于理解。

不要害怕在这里告诉人们你的算法正在做什么的细节以及为什么; 我们这里也是一群电脑爱好者，我们'重新理解：）

修改对列表的引用

1 个答案: