我正被困在这里,试图教我的学生有关卡方的知识。我制作了一段视频,该视频应该最有帮助,但是我在制作带有卡方分布特定属性的图形时遇到了麻烦。形状合适,但是有很多噪音。这是模拟数据,因此永远不会完全平滑,但这有点多。
我一直在尝试平滑数据。我已经尽力将数据四舍五入到最接近的十分之一,并执行移动平均值(k = 3),以便获得一个像这样的图形:
Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, rounded and smoothed
Chi-Squared Simulation df = 3, sample size = 100, samples = 100000, not rounded, smoothed
在解决此问题时,我注意到了一些事情。首先,峰值和下降似乎发生在可预测的位置。其次,如果不进行四舍五入,图形似乎会在尖峰和低谷之间规则地来回交替。我认为这可能是由于某种二进制精度问题引起的。我试图通过切换到对操作使用numpy并将数据强制为float64来解决此问题。这没有效果。
我想知道的是:
谢谢您的协助。代码在下面。
# Draw n samples of 25 and get Chi-Square list
chiSqrList = []
n = 100000
sampleSize = 100
j = 0
while j < n:
redTotal = 0
greenTotal = 0
yellowTotal = 0
blueTotal = 0
i = 0
while i < sampleSize:
x = random.random()
if x < redLim:
redTotal += 1
elif x < greenLim:
greenTotal += 1
elif x < yellowLim:
yellowTotal +=1
else:
blueTotal += 1
i += 1
observedBalls = np.array([redTotal, greenTotal, yellowTotal, blueTotal], dtype=np.float64)
expectedBalls = np.array([sampleSize*redBalls, sampleSize*greenBalls, sampleSize*yellowBalls, sampleSize*blueBalls], dtype=np.float64)
chiSqr = 0
chiSqr = np.power((observedBalls - expectedBalls), 2)/expectedBalls
chiSqr = np.sum(chiSqr)
chiSqr = round(chiSqr, 1)
chiSqrList.append(chiSqr)
j += 1
# Make count data
avgSqrDist = []
count = []
i = 0
for value in chiSqrList:
if len(avgSqrDist) == 0:
avgSqrDist.append(value)
count.append(1)
elif avgSqrDist[i] != value:
avgSqrDist.append(value)
count.append(1)
i += 1
else:
count[i] += 1
# Smooth curve
i = 0
smoothAvgSqrDist = []
smoothCount = []
while i < len(avgSqrDist)-2:
smoothCount.append((count[i]+count[i+1]+count[i+2])/3)
smoothAvgSqrDist.append(avgSqrDist[i+1])
i += 1