Unbiased variance estimate (n-1) simulation in python fails

时间:2018-04-20 00:57:15

标签: python statistics

I was trying to write a little program that simulates sampling from random numbers in Python3. But it seems to show the opposite of what I intended. What am I doing wrong? It must be extremely easy, but I don't get it.

import random
import statistics
import math

pcounter = 0
counter = 0
for loop in range(1000):
    l = []
    for x in range(500):
        l.append(random.randint(1,1000))

    m = statistics.mean(l)
    v = list(l)
    v[:] = [(x-m)**2 for x in v]
    realvariance = sum(v)/len(v)
    #print("Real Variance: " + str( sum(v)/len(v)))
    #print("Real Mean: " + str(m))


    sample = random.sample(l, 10)
    v = list(sample)
    #print(v)
    v[:] = [(x-m)**2 for x in v]
    samplem = statistics.mean(sample)
    samplebiasedvariance = sum(v)/len(v)
    samplevariance = sum(v)/(len(v)-1)

    print(samplebiasedvariance)
    print(samplevariance)
    print(realvariance)
    print((samplebiasedvariance - realvariance)**2 < (samplevariance - realvariance)**2)
    if (samplebiasedvariance - realvariance)**2 < (samplevariance - realvariance)**2:
        pcounter = pcounter + 1     
        print("biased Variance wins: " + str(pcounter))

    else:
        counter = counter + 1
        print("Variance wins: " + str(counter))

print("biased Variance wins: " + str(pcounter))
print("Variance wins: " + str(counter))

This results in:

biased Variance wins: 563
Variance wins: 437

But it should be the other way around: I would expect the biased Variance to be worse then the unbiased Variance that is calculated using (n-1). Therefore it should be more often closer to the true population Variance (realvariance) then the biased one.

1 个答案:

答案 0 :(得分:1)

“偏见”是一个误导性术语 - 它在数学公式中暗示了某种道德问题。

您所看到的基本上是两个方差估计量的均方误差。 (无论哪个更接近实际值,均方误差都会更小。)事实证明,无偏样本方差的均方误差大于通常的偏差样本方差,后者的均方误差比用1 /计算的样本方差更大。 (n + 1)代替1 / n或1 /(n - 1)。

如果我理解正确,如果你将1 /(n + 1)估算器放入你的程序中,你应该会发现它比其他两个更接近实际值。

在“人口差异和样本差异”标题下的variance维基百科页面上讨论了该主题。毫无疑问,还有很多其他资源。