Question

我编写了一个python代码来计算数字列表的标准偏差。我在excel上检查了我的答案，它似乎已关闭。我不确定我是否错过了一步或者我是否应该担心，但如果有人有时间查看代码并查看他们是否发现错误，请告诉我。谢谢。

city_population = [2123,1284,7031,30788,147,2217,10000]

mean = sum(city_population,0.0)/len(city_population)

def stdev(city_population):
    length = len(city_population)
    total_sum = 0
    for i in range(length):
        total_sum += pow((city_population[i]-mean),2)
        result = (total_sum/(length-1))
        return sqrt(result)
stan_dev = stdev(city_population)
print "The standard deviation is",(stan_dev)

输出： The standard deviation is 9443.71609738

excel：9986.83890663

Answer 1

您的问题主要是由于您的循环中的代码用于计算总和。在这个循环中，您还要在每次迭代时计算结果，然后从函数返回。这意味着只有一次循环迭代运行。

运行代码时，我得到结果2258.72114877，它仅从第一个值计算得出。通过将代码更改为以下内容，可以生成正确的样本标准偏差：

city_population = [2123,1284,7031,30788,147,2217,10000]

mean = sum(city_population,0.0)/len(city_population)

def stdev(city_population):
    length = len(city_population)
    total_sum = 0
    for i in range(length):
        total_sum += pow((city_population[i]-mean),2)
    # total_sum is 698158659.4285713
    result = (total_sum/(length-1))
    # result is 116359776.57142855
    # sqrt(result) is 10787.01889177119
    return sqrt(result)

stan_dev = stdev(city_population)
print "The standard deviation is",(stan_dev)

这个新结果与Excel的值不同的原因是Excel返回了总体标准偏差。作为快速参考，以下页面可能对您有用：

https://statistics.laerd.com/statistical-guides/measures-of-spread-standard-deviation.php

如果没有要求从头开始编写代码，我建议使用Numpy来避免重新发明轮子：http://www.numpy.org/。有了这个，你的代码就变成了：

import numpy
city_population = [2123,1284,7031,30788,147,2217,10000]
numpy.std(city_population, ddof=1)

其他一些提示：为避免将来出现混淆和潜在问题，请尽量避免将函数参数命名为全局变量。并且尽量不要依赖于函数中先前设置的变量（就像你使用＆＃34;意思是＆＃34;这里）。

Answer 2

问题是你在循环中有回归！

以下内容应该有效：

def stdev(city_population):
    length = len(city_population)
    total_sum = 0
    for i in range(length):
        total_sum += pow((city_population[i]-mean),2)
    result = (total_sum/(length))
    return sqrt(result)

而不是标准偏差，你需要除以长度而不是长度-1（如果你有样本，那就是整个人口）。

Python标准偏差检查

2 个答案: