平滑卡方演示的模拟数据

时间:2020-03-24 19:27:27

标签: python statistics precision smoothing graphing

我正被困在这里,试图教我的学生有关卡方的知识。我制作了一段视频,该视频应该最有帮助,但是我在制作带有卡方分布特定属性的图形时遇到了麻烦。形状合适,但是有很多噪音。这是模拟数据,因此永远不会完全平滑,但这有点多。

我一直在尝试平滑数据。我已经尽力将数据四舍五入到最接近的十分之一,并执行移动平均值(k = 3),以便获得一个像这样的图形:

Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, rounded and smoothed

Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, not rounded, smoothed

在解决此问题时,我注意到了一些事情。首先,峰值和下降似乎发生在可预测的位置。其次,如果不进行四舍五入,图形似乎会在尖峰和低谷之间规则地来回交替。我认为这可能是由于某种二进制精度问题引起的。我试图通过切换到对操作使用numpy并将数据强制为float64来解决此问题。这没有效果。

我想知道的是:

  1. 如果此问题是由二进制精度引起的,我该如何适当地减轻这种情况?
  2. 如果不能以这种方式解决此问题,是否可以使用更好的平滑操作?

谢谢您的协助。代码在下面。

# Draw n samples of 25 and get Chi-Square list
chiSqrList = []

n = 100000
sampleSize = 100

j = 0

while j < n:


redTotal = 0
greenTotal = 0
yellowTotal = 0
blueTotal = 0

i = 0

while i < sampleSize:
    x = random.random()
    if x < redLim:
        redTotal += 1
    elif x < greenLim:
        greenTotal += 1
    elif x < yellowLim:
        yellowTotal +=1
    else:
        blueTotal += 1

    i += 1

observedBalls = np.array([redTotal, greenTotal, yellowTotal, blueTotal], dtype=np.float64)
expectedBalls = np.array([sampleSize*redBalls, sampleSize*greenBalls, sampleSize*yellowBalls, sampleSize*blueBalls], dtype=np.float64)


chiSqr = 0


chiSqr = np.power((observedBalls - expectedBalls), 2)/expectedBalls
chiSqr = np.sum(chiSqr)

chiSqr = round(chiSqr, 1)

chiSqrList.append(chiSqr)

j += 1

# Make count data

avgSqrDist = []
count = []

i = 0

for value in chiSqrList:
    if len(avgSqrDist) == 0:
        avgSqrDist.append(value)
        count.append(1)
    elif avgSqrDist[i] != value:
        avgSqrDist.append(value)
        count.append(1)
        i += 1
    else:
        count[i] += 1

# Smooth curve

i = 0
smoothAvgSqrDist = []
smoothCount = []

while i < len(avgSqrDist)-2:
    smoothCount.append((count[i]+count[i+1]+count[i+2])/3)
    smoothAvgSqrDist.append(avgSqrDist[i+1])
    i += 1

0 个答案:

没有答案