Question

I was trying to write a little program that simulates sampling from random numbers in Python3. But it seems to show the opposite of what I intended. What am I doing wrong? It must be extremely easy, but I don't get it.

import random
import statistics
import math

pcounter = 0
counter = 0
for loop in range(1000):
    l = []
    for x in range(500):
        l.append(random.randint(1,1000))

    m = statistics.mean(l)
    v = list(l)
    v[:] = [(x-m)**2 for x in v]
    realvariance = sum(v)/len(v)
    #print("Real Variance: " + str( sum(v)/len(v)))
    #print("Real Mean: " + str(m))


    sample = random.sample(l, 10)
    v = list(sample)
    #print(v)
    v[:] = [(x-m)**2 for x in v]
    samplem = statistics.mean(sample)
    samplebiasedvariance = sum(v)/len(v)
    samplevariance = sum(v)/(len(v)-1)

    print(samplebiasedvariance)
    print(samplevariance)
    print(realvariance)
    print((samplebiasedvariance - realvariance)**2 < (samplevariance - realvariance)**2)
    if (samplebiasedvariance - realvariance)**2 < (samplevariance - realvariance)**2:
        pcounter = pcounter + 1     
        print("biased Variance wins: " + str(pcounter))

    else:
        counter = counter + 1
        print("Variance wins: " + str(counter))

print("biased Variance wins: " + str(pcounter))
print("Variance wins: " + str(counter))

This results in:

biased Variance wins: 563
Variance wins: 437

But it should be the other way around: I would expect the biased Variance to be worse then the unbiased Variance that is calculated using (n-1). Therefore it should be more often closer to the true population Variance (realvariance) then the biased one.

Answer 1

“偏见”是一个误导性术语 - 它在数学公式中暗示了某种道德问题。

您所看到的基本上是两个方差估计量的均方误差。（无论哪个更接近实际值，均方误差都会更小。）事实证明，无偏样本方差的均方误差大于通常的偏差样本方差，后者的均方误差比用1 /计算的样本方差更大。（n + 1）代替1 / n或1 /（n - 1）。

如果我理解正确，如果你将1 /（n + 1）估算器放入你的程序中，你应该会发现它比其他两个更接近实际值。

在“人口差异和样本差异”标题下的variance维基百科页面上讨论了该主题。毫无疑问，还有很多其他资源。

Unbiased variance estimate (n-1) simulation in python fails

1 个答案: