Question

我有一个数字列表，我想计算其标准偏差。我使用两种不同的方法来计算值：1.使用Python统计模块和2.使用标准差公式。结果是两个不同的数字，但是有些接近。统计模块如何计算标准偏差是否有所不同，或者与我的编码计算有关？我也不知道math.sqrt（）在内部如何工作，但我认为它使用某种近似值。

import statistics
import math    

def computeSD_S(variable):
    # Open the file and read the values in the column specified
    var_list = openAndReadVariable(variable)
    # Try to compute the median using the statistics module and print an error if a string is used as input
    try:
        st_dev = statistics.stdev(var_list)
        return st_dev
    except TypeError:
        return 'Variable values must be numerical.'

def computeSD_H(variable):
    # Open the file and read the values in the column specified
    var_list = openAndReadVariable(variable)
    sum = 0
    # Try to compute the standard deviation using this formula and print an error if a string is used as input
    try:
        # Find the mean
        mean = statistics.mean(var_list)
        # Sum the squared differences
        for obs in var_list:
            sum += (obs-mean)**2
        # Take the square root of the sum divided by the number of observations
        st_dev = math.sqrt(sum/len(var_list))
        return st_dev
    except TypeError:
        return 'Variable values must be numerical.'

variable = 'Total Volume'
st_dev = computeSD_S(variable)
print('Standard Deviation', st_dev)
st_dev = computeSD_H(variable)
print('Standard Deviation', st_dev)

结果输出：

Standard Deviation 3453545.3553994712
Standard Deviation 3453450.731237387

除了使用统计模块计算平均值外，我还手动计算了平均值并收到了相同的结果。

Answer 1

有什么和为什么：

您自己的算法是除以数组中的元素数量而不是数组中的元素-1。

现在为什么应该除以N-1而不是N？

This post似乎有一个很好的解释，您可以找到更多的资源来解释为何标准差公式除以N-1而不是1的原因。

如果我们查看标准偏差文档，我们可以看到：

statistics.stdev（data，xbar = None）

返回样本标准偏差（样本方差的平方根）。

它计算样本标准偏差（也称为N-1除法）。解决方案1是通过修改除法将功能与stdev匹配。

解决方案2将stdev替换为pstdev：

statistics.pstdev（data，mu = None）

返回总体标准偏差（总体方差的平方根）。

pstdev计算总体标准偏差，换句话说，计算当前函数计算的结果。

Python统计信息模块返回的标准偏差与计算得出的不同

1 个答案: