计算"接近多重性"从嘈杂值列表中

时间:2018-02-14 16:18:47

标签: python statistics

问题

让我们说我有一个具有大于1的公倍数的值列表。例如,让我们将倍数设为3并形成该值的倍数的集合:

harmonicList = [3,6,6,3,3,9,27,3,15,18,9]

现在我添加一些噪音:

harmonicList = [ v + (random() * 2 - 1 ) * 0.1 for v in harmonicList]

我正在搜索一个算法,当列表中的项目接近公共值的倍数时将返回接近1.0的数字但当数字不接近倍数时接近0.0的算法 - 例如例如,当列表是素数的集合时。

是否有这样的衡量标准"接近多重性"?

为什么我要解决这个问题

我目前正在尝试使用Hough Transform在屏幕截图中检测Chessboard。有时案例是理想的,而且效果很好:

enter image description here

但有时不是: enter image description here

我想检测出有很多异常的情况。因此,我的想法是计算检测到的线的交叉点,并创建一个长度集合(仅当水平或垂直的线条时)。如果检测结果良好,我知道会有一个很好的"谐度"在这个集合中,我可以使用该算法和阈值。

我知道可能有更好的方法来检测棋盘。也许这个甚至是愚蠢的,但问题是怎么来的,我发现它很有趣。

3 个答案:

答案 0 :(得分:3)

这是Robert Dodier's maximum log-likelihood idea的Python实现。我加了一个得分 功能,score - 不是Robert Dodier在他的回答中描述的那个 - 而只是残差平方和(x减去其最接近的平方和) 多)。为了使得分从0变为1,我采用了指数 这个平方和的负数:

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt


primes = np.array([2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
                   47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97])

def harmonic(divisor, size=10):
    return np.random.randint(1, 10, size=size) * divisor

def prime_sample(size=10):
    return np.random.choice(primes, size=size)

def noisy(x, amount=0.1):
    return x + (np.random.random(size=len(x)) * 2 - 1) * amount

def prob(x, mean, sd):
    return stats.norm.pdf(x, loc=mean, scale=sd)

def score(x, multiplier, offset, kmax=20):
    k = np.arange(kmax)
    means = (k * multiplier + offset)[:, None]
    closest_multiple = (np.abs(x - means).argmin(axis=0)) * multiplier
    result = np.exp(-((x - closest_multiple)**2).sum())
    return result

def fit(x, multipliers, offsets, kmax=20, sd=0.2):
    "sd is the standard deviation of the noise"
    k = np.arange(kmax)
    M, O, K = np.meshgrid(multipliers, offsets, k, indexing='ij')
    means = (K * M + O)[..., None]
    p = prob(x, means, sd)
    # sum over the K axis, take the log, sum over x axis
    L = np.log(p.sum(axis=-2)).sum(axis=-1)
    # find the location of maximum log likelihood
    i, j = np.unravel_index(L.argmax(), L.shape)
    max_L = L[i, j]
    multiplier = multipliers[i]
    offset = offsets[j]
    return dict(loglikelihood=L, max_L=max_L,
                multiplier=multiplier, offset=offset,
                score=score(x, multiplier, offset, kmax))

multipliers = np.linspace(3, 10, 100)
offsets = np.linspace(-1.5, 1.5, 50)
X, Y = np.meshgrid(multipliers, offsets, indexing='ij')
tests = [([12,  8, 28, 20, 32, 12, 28, 16,  4, 12], 1),
         ([3, 5, 7, 11, 13, 27, 54, 57], 0),
         (noisy(harmonic(3, size=20)), 1),
         (noisy(prime_sample()), 0)]

for x, expected in tests:
    result = fit(x, multipliers, offsets, kmax=20)
    Z = result['loglikelihood']
    plt.contourf(X, Y, Z)
    plt.xlabel('multiplier')
    plt.ylabel('offset')
    plt.scatter(result['multiplier'], result['offset'], s=20, c='red')
    plt.title('score = {:g}, expected = {:g}'
              .format(result['score'], expected))
    plt.show()

x = [12, 8, 28, 20, 32, 12, 28, 16, 4, 12]

enter image description here 对于x = [3, 5, 7, 11, 13, 27, 54, 57]enter image description here

答案 1 :(得分:0)

我暗中希望有人会提出一个更好,更优雅的解决方案,但下面的代码片段至少可以给出一些启发:

for i in range(1,int(round(min(harmonicList)))+1): #makes no sense to look at bigger numbers
    temp_sum = 0
    for item in harmonicList:
        item = item % i
        temp_sum += min(item, i-item) #because noise can be either plus or minus
    if i == 1:
        comparator = temp_sum
    elif temp_sum == comparator:
        print(i, "is a common denominator")
#prints "3 is a common denominator"

相比之下:

primeList = [1,3,5,7,11,13,17]

然后运行相同的噪音发生器和for循环,将导致根本不打印任何内容。

for _ in range(1000):
    primeList = [1,3,5,7,11,13,17]
    primeList = [ v + (random() * 2 - 1 ) * 0.1 for v in primeList]
    for i in range(1,int(round(min(primeList)))+1):
        temp_sum = 0
        for item in primeList:
            item = item % i
            temp_sum += min(item, i-item)
        if i == 1:
            comparator = temp_sum
        elif temp_sum == comparator:
            print(i, "is a common denominator")
#No prints

答案 2 :(得分:0)

不是很干净(希望有人会改进)但是这样做了:

from operator import itemgetter
def near_multiplicity(num_list):
    rounded_num_list = [round(num) for num in num_list]
    factors = {}
    for num in rounded_num_list:
        for factor in range(2, int(num)+1):
            if int(num) % factor == 0:
                factors[factor] = factors.get(factor, 0) + 1
    sorted_factors = sorted(factors.items(), key=itemgetter(1), reverse=True)
    if sorted_factors[0][1] == 1:
        return 0
    best_factor = sorted_factors[0][0]
    noise = 0
    for num in num_list:
        distortion = num % best_factor
        noise += min(distortion, best_factor - distortion)
    average_noise = noise / len(num_list)
    return 1 - average_noise